Quality Engineering Reference Hub

We stand on the
shoulders
of giants.

Now it is
our turn.

"Every Cpk we compute, every control chart we plot, every FMEA we run — these are not bureaucratic checkboxes. They are acts of responsibility. Somewhere at the end of our supply chain is a person who will use what we make. They trust us, without knowing us, to have done the work properly."

— Quality Datalabs

Deming, Juran, Shewhart, Taguchi, Ishikawa — they spent lifetimes building the statistical and philosophical foundations of quality. Their tools are not old. They are permanent. Ours to use, teach, and pass forward.

This reference was built because quality knowledge should be accessible, precise, and free — not locked behind expensive textbooks or five-day seminars. Whether you are running a PFMEA at midnight, explaining Ppk to your manager, or diving deep into reliability analysis, this is for you.

15+ Modules
SSBB Level depth
41 Distributions
Free Always
Jump to
Start Here
Choose your path
A calmer entry point for first-time visitors and fast repeat use.
Quick Access
Use it like a product
Faster jumps for repeat visitors.
Explore All Modules
Jump to any topic
15 modules · Updated continuously
AIAG 4th Ed. 02

Measurement System Analysis

S.W.I.P.E. error model, stability, bias & linearity, GR&R via X̄-R & ANOVA, torque wrench drift case study.

11 tabsGRR acceptance zones
12 tabs
Explore →
Management New 03

Quality Philosophy

Deming, Juran, Crosby, Ishikawa. PDCA, DMAIC, Lean frameworks, strategic planning, facilitation tools.

8 tabsPioneer comparison
10 tabs
Explore →
Quality Systems 04

Quality Systems

ISO 9001 → IATF 16949 maturity, PPAP levels, special characteristics, 8D problem solving, escalation models, customer-specific requirements.

12 tabsQMS ladder SVG
13 tabs
Explore →
Process Control 05

Statistical Process Control

Cp/Cpk/Ppk, chart selector decision tree, annotated out-of-control patterns, Western Electric rules.

7 tabsDecision tree SVG
8 tabs
Explore →
Interactive Live

DPMO & Capability Calculator

Enter LSL, USL, mean, sigma. Instantly compute DPMO, sigma level, Cpk, Cp, defect probability.

11 calculatorsLive results
Open calculator →
Reliability 07

Reliability Engineering

MTBF/MTTR/Availability formulas, full bathtub curve SVG, Weibull β shape cards, series & parallel systems.

Weibull analysisBathtub curve
13 tabs
Explore →
Distributions New 08

Statistical Distributions

Normal, Weibull, Exponential, Lognormal, Binomial, Chi-square, Poisson, t, F — formulas, properties, applications.

39 distributionsSelector guide
4 tabs
Explore →
Defense New 09

Military & Defense Standards

MIL-STD-1629A FMECA, MIL-HDBK-217F reliability prediction, ANSI Z1.4 sampling, AS9100D, AQAP-2110.

12 standardsZ1.4 AQL table
8 tabs
Explore →
Statistics New 10

Applied Statistics

Hypothesis testing, confidence intervals, regression, ANOVA, chi-square — with quality engineering examples.

Hypothesis testingANOVA
6 tabs
Explore →
AIAG-VDA 2019 11

FMEA & RPN

DFMEA and PFMEA structure, S/O/D scales, live RPN calculator, action priority matrix.

7 tabsLive RPN calc
10 tabs
Explore →
Risk New 12

Risk Management

ISO 31000 framework, risk matrix construction, bow-tie diagrams, failure mode prioritization.

ISO 31000Bow-tie diagrams
8 tabs
Explore →
Experimentation New 13

Design of Experiments

Full factorial, fractional factorial, Taguchi orthogonal arrays, main effects, interaction plots, ANOVA.

OA selectorInteraction plots
11 tabs
Explore →
DFSS/SE New 14

Design for Six Sigma

DMADV roadmap, VOC to CTQ, concept selection, DOE optimisation, tolerance design, full worked example — from brief to production.

11 tabsFull project walkthrough
11 tabs
Explore →
Core Philosophy
Quality is a responsibility, not a checkbox.

As we stand on the shoulders of giants, we have a responsibility to be better — to strive continuously for quality products reaching the customer. Every engineer carries the trust of the end user, someone they will never meet, who relies on the work being done properly.

Good engineering is not sufficient if competitors select their designs from better alternatives. The goal is to empower every quality engineer to make better decisions, ship better products, and uphold the responsibility we carry — to the customer, to the craft, and to those who came before us.

Statistics & Process Quality

Six Sigma & DPMO

From normal distribution tails to defect probability — how sigma level, specification limits, and the 1.5σ long-term convention translate into real manufacturing quality targets.

Six Sigma Metrics Toolkit — DPU, DPO, DPMO, Yield & RTY

Before you can improve a process, you must be able to measure it precisely. Six Sigma uses a tightly connected family of metrics that scale from a single unit all the way to a million-opportunity benchmark. This tab gives you every formula, example, and visual you need.

📊 Six Sigma Metrics — How They Connect
DPU Defects / Unit DPO Defects / Opportunity DPMO DPO × 1,000,000 Sigma (Z) Φ⁻¹(1−DPO) FPY / Yield 1 − DPO RTY Y₁ × Y₂ × Y₃… Yield = 1 − DPO

① DPU — Defects Per Unit

DPU = Total Defects ÷ Total Units

The simplest defect metric — average number of defects found on each unit regardless of how many opportunities for failure each unit had. DPU of 0.15 means roughly 1 defect per 7 units.

Example: 75 defects found across 500 units → DPU = 75/500 = 0.15
Limitation: DPU ignores complexity. A complex PCB and a simple bracket both become "one unit." Use DPO for cross-process comparison.

② DPO — Defects Per Opportunity

DPO = Total Defects ÷ (Total Units × Opportunities per Unit)

Normalises the defect rate by the number of distinct ways a unit can fail. Enables fair comparison between processes of different complexity. An "opportunity" is any characteristic that could be measured and found defective.

Example: 75 defects, 500 units, 4 opportunities each → DPO = 75/(500×4) = 0.0375
Defining opportunities consistently is critical — too many opportunities dilutes DPO; too few inflates it.

③ DPMO — Defects Per Million Opportunities

DPMO = DPO × 1,000,000

Scales DPO to a per-million basis, making tiny defect rates intuitive and industry-comparable. The Six Sigma world-class target is 3.4 DPMO — accounting for the 1.5σ long-term drift of a real process.

Example: DPO 0.0375 → DPMO = 0.0375 × 1,000,000 = 37,500 DPMO → approximately 3.3σ process
308,537
66,807
6,210
233
3.4 ★

④ FPY — First Pass Yield

FPY = Good Units ÷ Total Units = 1 − DPO

The percentage of units that complete a process step without any rework, repair, or scrap. FPY declining is often the first visible signal that hidden rework costs are accumulating. A plant can show high throughput but terrible FPY if rework is baked into the process.

Example: 460 good units from 500 → FPY = 460/500 = 92%  (8% hidden rework cost)

⑤ RTY — Rolled Throughput Yield

RTY = FPY₁ × FPY₂ × FPY₃ × … × FPYₙ

RTY multiplies yields across all process steps. Even individually high-yield steps compound to a much lower overall throughput. This is the metric that exposes the true cumulative cost of a multi-step process and shows why Six Sigma targets perfection at each step.

📊 RTY Compounding — 3-Step Process
Step 1 98% FPY₁ × Step 2 95% FPY₂ × Step 3 97% FPY₃ = RTY 90.2% 0.98×0.95×0.97

Three steps each at ≥95% FPY combine to only 90.2% RTY. Nearly 1 in 10 units has a defect somewhere in the process. RTY forces the question: where is the quality loss occurring?

⑥ Converting DPMO to Sigma Level (Z)

Z_ST = Φ⁻¹(1 − DPO)  |  Z_LT = Z_ST + 1.5 (after shift)

Sigma level (Z) is derived from DPO using the inverse normal CDF. Short-term Z always looks better — the 1.5σ shift accounts for long-term process drift. A process measuring 4.5σ short-term is considered "6 Sigma" because the shift brings it to 3.0σ long-term — but Six Sigma convention adds 1.5 to both sides.

# DPMO → Z_LT | Formula: Z_LT = Φ⁻¹(1 − DPMO ÷ 1,000,000) # Z_ST = Z_LT + 1.5σ (short-term is always 1.5σ better) DPMO = 317,311 → Z_LT ≈ 1.00σ DPMO = 45,500 → Z_LT ≈ 2.00σ DPMO = 37,500 → Z_LT ≈ 1.78σ ← our worked example DPMO = 2,700 → Z_LT ≈ 3.00σ DPMO = 233 → Z_LT ≈ 3.50σ DPMO = 63.3 → Z_LT ≈ 4.00σ DPMO = 3.4 → Z_LT ≈ 4.50σ ← "6 Sigma" long-term (with 1.5σ shift) DPMO = 0.002 → Z_LT = 6.00σ ← true 6σ short-term, centred

⑦ Quick Reference — Diagnostic Signals

DPU ↑ → DPMO ↑ → Sigma ↓

More defects = lower process capability. Focus improvement on the highest DPMO step first.

Cp High, Cpk Low

Process spread is fine but the mean is off-center. Fix centering before reducing σ.

RTY ↓ → Process Loss

Compounded yield drop exposes hidden rework cost. Drill into which step has the lowest FPY.

FPY ↓ → Rework Accumulating

Units leaving a step with defects silently inflate cost. FPY below 95% warrants immediate DMAIC attention.

How the Normal Distribution Creates DPMO

Every manufacturing process produces outputs that vary. When plotted, most processes follow a normal distribution — a symmetric bell curve where values cluster near the mean (µ) and tail off toward the extremes.

The specification limits define the acceptable range. Any output beyond LSL or USL is a defect. DPMO = the area of both red tails × 1,000,000.

📊 Anatomy of the Normal Distribution
LSL USL µ ±1σ 68.27% ±2σ 95.45% defects defects 0 −1σ +1σ −2σ +2σ −3σ +3σ Process output X →

Step-by-Step: µ and σ → DPMO

  • 1

    Standardize to Z

    Z = (X − µ)/σ — converts any measurement to "how many standard deviations from the mean?" Z ~ N(0,1).

  • 2

    Find Z at each spec limit

    ZUSL = (USL−µ)/σ   ZLSL = (µ−LSL)/σ. Distance from mean to each spec in σ units.

  • 3

    Compute both tail areas

    p = [1−Φ(ZUSL)] + [1−Φ(ZLSL)] — the red shaded areas on both sides of the curve.

  • 4

    Scale to DPMO

    DPMO = p × 1,000,000. Each additional σ level reduces DPMO by ~100×–1000×.

How the Metrics Connect — Follow the Flow
Spec Limits
USL, LSL
Engineering requirement
+
Process Centre
µ (mean)
Where process aims
+
Process Spread
σ (std dev)
How much it varies
Z Upper
ZUSL = (USL−µ)/σ
Distance to upper spec in σ units
BOTH
TAILS
Z Lower
ZLSL = (µ−LSL)/σ
Distance to lower spec in σ units
Tail Probability (defect fraction)
p = [1−Φ(ZUSL)] + [1−Φ(ZLSL)]
The combined red shaded area under both tails of the bell curve
DPMO
p × 1,000,000
Defects per million opportunities
Sigma Level
Z = Φ⁻¹(1−p) + 1.5
Long-term sigma (with 1.5σ shift)
Cp — Potential
(USL−LSL)/(6σ)
Can the process fit?
Cpk — Actual
min(Cpu, Cpl)
Is it centred too?
Ppk — Long-term
min(Ppu, Ppl)
With all sources of variation
💡

DPMO is per opportunity. If one unit has 5 weld joints and each is one "opportunity," unit defect rate ≠ DPMO. Always define what "one opportunity" means before comparing across processes.

🔑 Key Definitions

  • DPMO

    Defects Per Million Opportunities — normalizes defect rates for fair comparison across different process complexities.

  • Φ(z)

    Standard normal CDF — cumulative area under the bell curve to the left of z. Tail = 1 − Φ(z).

  • Sigma Level (Z)

    Distance from process mean to nearest spec in standard deviations. Higher = better quality.

  • True 6σ Centered

    Two-sided DPMO ≈ 0.002. Roughly 1 defect per 507 million opportunities.

Plastic Housing Wall Thickness: 2.450 – 2.550 mm

A precision injection-moulded housing for an electronic sensor. The design team has set a tight wall-thickness specification to ensure structural integrity and correct fit. LSL = 2.450 mm, USL = 2.550 mm — a bilateral tolerance of ±0.050 mm. Your task: determine whether the current process is capable, and what happens when it drifts.

The Scenario

Production data from 200 parts shows: µ = 2.500 mm (centred), σ = 0.00833 mm. A process audit later reveals mean drift to µ = 2.5125 mm — a +1.5σ shift typical of long-term process behaviour.

Step A — Compute σ Required for True 6σ

Tolerance → σ relationship
d = (2.550 − 2.450) / 2
= 0.050 mm (half-tolerance)
σ = d / Ztarget = 0.050 / 6
= 0.00833 mm required for true 6σ

Step B — Centred Process (Short-term, µ = 2.500 mm)

Z-score calculation — both spec limits
ZUSL = (2.550 − 2.500) / 0.00833
       = 0.050 / 0.00833
       = 6.000
ZLSL = (2.500 − 2.450) / 0.00833
       = 0.050 / 0.00833
       = 6.000
p = 2 × [1 − Φ(6.000)] = 2 × 9.866×10⁻¹⁰
DPMO = 0.002  |  Sigma level = 6.0σ (ST)

Step C — After +1.5σ Drift (µ = 2.5125 mm)

Mean has drifted — Z-scores are now asymmetric
ZUSL = (2.550 − 2.5125) / 0.00833
       = 0.0375 / 0.00833
       = 4.500
ZLSL = (2.5125 − 2.450) / 0.00833
       = 0.0625 / 0.00833
       = 7.500
p = [1−Φ(4.5)] + [1−Φ(7.5)] ≈ 3.398×10⁻⁶ + ~0
DPMO = 3.4  |  Sigma level = 4.5σ (LT)

Step D — Capability Indices

Cp — Potential
2.000
(USL−LSL)/(6σ) = 0.1/0.05
Cpk — Short-term
2.000
Centred → Cp = Cpk
Cpk — After Drift
1.500
min(4.5/3, 7.5/3) = 1.5
📊 Centred 6σ vs +1.5σ Shifted Process (2.450–2.550 mm spec)
LSL 2.450 USL 2.550 −3σ −2σ −1σ µ=2.500 +1σ +2σ +3σ +1.5σ drift Centred 6σ (Cpk=2.0) +1.5σ Drifted (Cpk=1.5) ⚠ 3.4 DPMO
⚠️

Even a well-designed 6σ process accumulates drift over time. This is why Six Sigma reports two separate numbers: short-term Cp/Cpk (from a tightly controlled study) and long-term Ppk (from production data including all sources of variation). Always specify which you are reporting.

📋 Process Summary

ParameterValue
FeatureWall thickness
LSL2.450 mm
USL2.550 mm
µ (centred)2.500 mm
σ (at 6σ)0.00833 mm
Cp2.000
Cpk (centred)2.000
DPMO (centred)0.002
µ after +1.5σ drift2.5125 mm
ZUSL (drifted)4.500
Cpk (drifted)

🔑 What This Tells You

  • Cp = 2.0 — the tolerance window is twice what the process spread needs. Excellent potential.
  • Cpk = 2.0 (centred) — the process is hitting its potential. World class.
  • Cpk = 1.5 (drifted) — still very capable, but DPMO jumped from ~0 to 3.4.
  • This is why control charts matter — to catch drift before it escalates.

The 1.5σ Shift — Why "3.4 DPMO at 6σ"?

The famous 3.4 DPMO figure comes from a single assumption: real-world processes drift by approximately 1.5σ over the long term due to tool wear, raw material shifts, and environmental changes.

📊 Short-term 6σ becomes Long-term 4.5σ to the Nearest Spec
LSL USL +1.5σ shift -6σ -4σ -2σ +0σ +2σ +4σ +6σ 6σ short-term (µ centred) DPMO ≈ 0.002 (negligible) 4.5σ long-term (µ shifted +1.5σ) DPMO = 3.4 — the famous figure Short-term (centred) Cpk=2.0 · 0.002 DPMO Long-term (+1.5σ shift) Cpk=1.5 · 3.4 DPMO ⚠ 3.4 DPMO
With 1.5σ Shift Applied — µ moves from 250 → 312.5 ppm
ZUSL = (500 − 312.5) / 41.667 = 4.500
ZLSL = (312.5 − 0) / 41.667 = 7.500
pUSL ≈ 3.398×10⁻⁶  (dominates)
pLSL ≈ 3.186×10⁻¹⁴  (negligible)
DPMO ≈ 3.4

Cp vs Cpk — The Critical Distinction

🎯
Cp
(USL−LSL) / 6σ

Process potential. Ignores mean position. "Could it fit if centered?"

📍
Cpk
min[(USL−µ)/3σ, (µ−LSL)/3σ]

Actual capability. Accounts for mean position. Cpk ≤ Cp always.

📅
Ppk
uses σ_overall (incl. drift)

Long-term performance. Includes all variation sources including drift.

💡

Rule: Large Cp−Cpk gap = process is capable but off-center. Fix centering first before trying to reduce σ. If Cp ≥ 1.33 but Cpk < 1.33, the problem is mean position, not spread.

⚖️ ST vs LT Sigma

ST ZLT ZLT DPMO
1.5σ66,807
2.5σ
3.5σ233
4.5σ3.4

Sigma Level ↔ DPMO Reference

The sigma-DPMO relationship is exponential — each additional sigma level cuts DPMO by one to three orders of magnitude. The visual below makes this concrete.

📊 DPMO at Each Sigma Level — Relative Scale (log-mapped to bar width)
317,311 68.3% yield
45,500 95.5% yield
2,700 99.73% yield
6,210 99.379% LT
4.5σ ★
3.4 99.99966% LT
0.573 99.99994%
6σ ★
0.00197 99.99966%
Sigma (Z)1-sided DPMO2-sided DPMOLT DPMO (+1.5σ)Defect %Yield %
158,655317,311697,67231.73%68.27%
22,75045,500308,5374.55%95.45%
1,3502,70099.73%
31.6763.3499.9937%
0.2870.5732330.000057%99.99994%
0.0009870.001973.43.4×10⁻⁷%99.99966%
1.28e-72.56e-70.019~0~100%

Monte Carlo Simulation

Monte Carlo generates thousands of random N(µ,σ) samples and counts how often they fall outside spec limits. It validates analytical DPMO and teaches tail probability concepts visually — especially useful for non-normal processes.

import numpy as np # ── Specification limits ── LSL, USL = 0, 500 # ── Process parameters ── mu = 250 # mean (centred) sigma = 41.667 # σ = 250/6 for 6σ N = 1_000_000 # simulation size # ── Centred process ── x = np.random.normal(mu, sigma, N) defects = np.sum((x < LSL) | (x > USL)) dpmo = defects / N * 1_000_000 # ── +1.5σ shifted process ── x2 = np.random.normal(mu + 1.5*sigma, sigma, N) dpmo_shifted = np.sum((x2 < LSL) | (x2 > USL)) / N * 1e6

Simulation Results (N = 400,000)

CaseµDefects (N=400K)Est. DPMOAnalytical
Centered 6σ25000.0000.00197
+1.5σ Shifted312.52
📖

Zero defects in 400,000 samples at true 6σ is correct, not a bug. You'd need ~500 million samples to reliably observe a single 6σ defect. For extreme sigma levels, analytical methods are far more practical than simulation.

When Simulation Beats Analytical Methods

  • When the process distribution is non-normal (skewed, bimodal, truncated)
  • When multiple interacting dimensions or GD&T stackups are involved
  • When teaching the effect of mean shift, σ reduction, or spec change visually

🎲 Required Sample Size

SigmaNeed N ≥
175M
507M

Rule of thumb: N ≥ 10/p for reliable estimation. Use analytical at 5σ+.

DMAIC — The Five-Phase Process Improvement Roadmap

DMAIC is the backbone of every Six Sigma project. It takes a problem through five sequential phases — each with specific tools and deliverables — to arrive at a sustainable solution that eliminates root cause rather than treating symptoms.

📊 DMAIC Process Flow — Problem to Sustainable Solution
Problem Input D DEFINE Scope · Charter · SIPOC M MEASURE Process Map · FMEA · σ A ANALYZE Hypothesis · ANOVA I IMPROVE DOE · Pilot · Solutions C CONTROL SPC · Control Plan Solution
D — DEFINE
Link the problem to organisational priorities and secure management commitment

Starts with COPQ/Pareto analysis to identify and prioritise the problem. SIPOC diagram scopes the project boundaries (7–8 key process steps). Ends with a signed charter containing problem statement, goal, scope, estimated savings, team, and timeline.

VOC / CTQ Tree
SIPOC Diagram
Project Charter
M — MEASURE
Establish the current baseline and validate the measurement system

A Y=F(X) process map identifies all inputs and outputs. FMEA quantifies risk by RPN. Gage R&R validates measurement before collecting capability data. The phase ends with a confirmed baseline sigma level (Cpk) and an accepted measurement system.

Y=F(X) Map
FMEA / RPN
GR&R / MSA
Process Sigma
A — ANALYZE
Identify and validate root causes with data — not opinions

Hypothesis tests (t-test, ANOVA) compare means between conditions. Correlation and regression reveal input-output relationships. 5-Whys and Ishikawa structure the cause-and-effect thinking. The phase ends with statistically validated root causes.

Hypothesis Test
ANOVA
Regression
5-Whys / Ishikawa
I — IMPROVE
Develop, test and implement solutions that address root causes

Design of Experiments (DOE) maps the relationship between input factors and output responses, finding optimal operating conditions. Solutions are piloted before full rollout. The Improve phase ends with a statistically significant improvement in the baseline metric.

DOE / RSM
Pugh Matrix
Piloting
Poka-Yoke
C — CONTROL
Sustain the gains and prevent regression to the old process

SPC charts monitor the improved process in real-time. A Control Plan documents what to measure, how often, and what action to take on signals. Updated FMEA, process maps, and SOPs transfer ownership back to the process team. Project savings are calculated and reported.

SPC Charts
Control Plan
Updated FMEA
Final Report
💡

DMAIC is not always needed. If a problem already has a known solution and action plan, it is an implementation project — just execute the plan. DMAIC is reserved for problems where the root cause is genuinely unknown.

Splitting the DMAIC — Four Focused Paths to Improvement

Full DMAIC training covers dozens of tools across five phases. Research into successful projects shows that four common paths account for the vast majority of real improvements. Each path has a clear objective, a targeted tool set, and a repeatable sequence. Matching the right path to the right problem dramatically increases success rate.

📊 Four DMAIC Paths — Match Your Problem to Its Path
DMAIC D · M · A · I · C ↓ Reduce Variability MSA → SPC → Cpk Target: Cpk ≥ 1.33 ↓ Reduce Failures FMEA → MTBF → TPM Target: OEE ≥ 85% ↓ Reduce Waste VSM → 5S → Kanban Target: Lead time ↓ ↓ Reduce Defects Pareto → 5-Why → Poka-Yoke Target: DPMO ↓↓↓
PATH 1 — Reduce Variability

Goal: Achieve stable, predictable, capable output (Cpk ≥ 1.33)

This is the heart of classic Six Sigma — SPC was its original tool. Key insight: don't start with the control chart. First validate the measurement system, then characterise the process, then chart it. Starting with charts on an unvalidated measurement system is a very common and costly mistake.

Process Map → I/O Matrix → Specs / Targets → SOPs → MSA (R&R + Stability + Linearity) → Potential Study (Cpk) → Control Chart → Eliminate Special Causes → Reduce Common Causes → Capture & Standardise
PATH 2 — Reduce Failures

Goal: Increase machine/process uptime and throughput

Targets machine breakdowns and availability losses. Asset Utilization (AU) waterfall charts identify the top loss categories. Component matrices link failure modes to parts. Weibull analysis predicts failure timing and drives condition-based maintenance strategy.

Define Target Process → AU Loss Waterfall → Component Matrix → Failure Modes → Maintenance Strategy → Autonomous Maintenance → Growth Tracking → Capture
PATH 3 — Reduce Waste (Lean)

Goal: Eliminate the 8 wastes — TIMWOOD + Skills

Value Stream Mapping reveals waste across the flow. 5S eliminates inventory and motion waste. Kanban controls overproduction. QFD aligns specs to customer need — often revealing specs that are unnecessarily tight (over-processing waste). The Lean path is the newest and most popular, but its sand pits can trap unwary teams if used for the wrong problem type.

Transport
Inventory
Motion
Waiting
Overproduction
Over-processing
Defects
+ Skills
PATH 4 — Reduce Defects

Goal: Drive defect frequency to zero

Defects are things that shouldn't be there at all — unlike variability, there is no optimal level other than zero. This is the widest, most common DMAIC path. Its tools are simple and accessible to anyone at any belt level: Pareto to prioritise, Fishbone/5-Whys to find cause, Poka-Yoke to prevent recurrence, and standardisation to sustain.

Define Defect → Measure Frequency (Pareto) → Flow Diagram → Root Cause (5-Why + Fishbone) → Solution Matrix → Pilot → Full Implementation → Standardise → Verify Results
⚠️

Path selection principle (Quick, 2019): Management sets the goal and links it to KPIs. Teams never choose their own projects — projects without management linkage lose resources to crises. The need should drive the method, just as form follows function.

COPQ & Project Selection — Linking Six Sigma to Business Results

Every Six Sigma project must be tied to real business cost — otherwise it competes with day-to-day operations and loses. The Cost of Poor Quality (COPQ) framework ensures projects are prioritised by financial impact, not by seniority or gut feeling.

📊 Cost of Quality — The Four Buckets
Cost of Quality (COQ) Total quality-related spend Cost of Good Quality (COGQ) Conformance costs — prevention & appraisal Cost of Poor Quality (COPQ) Non-conformance — failure costs Prevention Training · audit · design Appraisal Inspection · calibration Internal Failure Scrap · rework · downtime External Failure Returns · warranty · recalls

Six Sigma projects must be identified from internal failure and external failure categories first — these directly impact bottom-line results. Prevention spending typically returns 3–5× its cost by reducing the failure categories. "Gating the defect" — catching quality issues in-house before they reach the customer — is a fundamental discipline.

Multi-Level Pareto — Drilling to Project Scope

A single Pareto identifies the biggest problem category. A second-level Pareto drills into that category. If a problem appears in the top 3 at both levels — by frequency and cost — it is the ideal project candidate.

📊 Two-Level Pareto — From Symptom to Project
Level 1 — Problem Category 0 20 40 60 80 45% 34% 20% 5% Bad Data Ineff. Apps Vendors Missing Spec drill into Bad Data Level 2 — Bad Data by Day 0 20 40 60 80 43% 30% 18% 7% Thursday Friday Wednesday Tuesday → Project: Fix bad data entry on Thursdays (43% of Bad Data category)

Project Charter — The Contract Between Team and Management

Required Charter Elements
  • ✓ Problem statement (what, where, when, magnitude)
  • ✓ Measurable goal with deadline
  • ✓ Scope — start & end point, in/out of scope
  • ✓ Team members & roles
  • ✓ Estimated savings from COPQ analysis
  • ✓ Timeline with phase gate milestones
  • ✓ Management signature (resource commitment)
Common Project Selection Failures
  • ✗ Choosing the hardest problem (years-old issue)
  • ✗ Selecting an already-approved capital project
  • ✗ No link to financial impact or KPIs
  • ✗ Team members choose their own projects
  • ✗ Scope too broad — "reduce all defects"
  • ✗ No management sign-off or resource commitment
  • ✗ Renaming existing firefighting as a DMAIC project

Selection rule (Shankar, ASQ 2009): Start from external failure costs, then internal failure costs. Problems with the highest combined frequency and cost across multiple Pareto levels are the ideal candidates. The data dictates priority — not management preference or the loudest voice in the room.

AIAG MSA 4th Edition (June 2010)

Measurement System Analysis

Before trusting process data, trust your measurement system. MSA quantifies how much observed variation is process — and how much is just the gauge. Every PPAP, every SPC chart, every capability index depends on getting this right first.

σ²_actual σ²_MSA

MSA Variation Taxonomy — The Complete Tree

Every measurement you take contains two fundamentally different kinds of variation. Understanding their structure is the foundation of all MSA work. The tree below shows the complete decomposition — from total observed variation down to each individual error source.

🌳 Measurement System Variation — Full Taxonomy Tree
Total Observed Variation σ²_observed = σ²_process + σ²_measurement Process Variation Part-to-Part σ²_p True part-to-part differences Drives SPC & capability indices (the signal we want to see) Measurement Variation Gauge / MSA Error σ²_ms Accuracy Systematic / Trueness Error Precision Random / Scatter Error Bias Offset from true value Linearity Bias varies by part size Stability Bias drifts over time Repeatability Same op., same part Reproducibility Between operators GR&R — Gage R&R Repeatability + Reproducibility LEGEND Total observed variation (root) Process / part-to-part variation Accuracy — systematic error Precision / GR&R — random error Measurement variation (combined) Total Observed Variation σ²_observed = σ²_process + σ²_measurement Process Variation Part-to-Part σ²_p True part-to-part differences Drives SPC & capability indices (the signal we want to see) Measurement Variation Gauge / MSA Error σ²_ms Accuracy Systematic / Trueness Error Precision Random / Scatter Bias Offset from true value Linearity Bias varies by part size Stability Bias drifts over time Repeatability Same operator, same part Reproducibility Between operators GR&R (Gage R&R) Repeatability + Reproducibility LEGEND Total observed variation Process / part variation Measurement variation Accuracy (systematic) Precision / GR&R (random)

Accuracy vs Precision — The Core Distinction

ACCURACY — Systematic Error

How close measurements are to the true reference value. Accuracy errors are consistent — they shift every reading in the same direction. A perfectly precise gauge can still be completely inaccurate.

Three components: Bias · Linearity · Stability
PRECISION — Random Error

How close repeated measurements are to each other. Precision errors are random — they scatter results around some central value. High precision doesn't guarantee accuracy; a precise gauge can be precisely wrong.

Two components: Repeatability · Reproducibility → GR&R

The Five MSA Error Components Explained

① BIAS — Accuracy Component

The systematic offset from true value

Bias is the difference between the observed average measurement and the reference/true value for the same part. A gauge with positive bias reads high consistently; negative bias reads low. It is measured by comparing the gauge average against a known reference standard (master part).

Bias = X̄_observed − Reference_Value  |  %Bias = Bias / Process_Variation × 100

Cause: Worn gauge, incorrect calibration, wrong reference standard, elastic deformation of gauge or part.

② LINEARITY — Accuracy Component

Bias that changes across the measurement range

Linearity asks: "Is bias the same at low values as at high values?" A gauge may read accurately near 5mm but overread near 25mm. Linearity is assessed by measuring multiple reference parts spread across the full operating range and plotting bias vs. reference value. The slope of the regression line is the linearity error.

Linearity = slope × Process_Variation  |  %Linearity = |slope| × PV × 100

Cause: Gauge not calibrated across full range, non-linear amplifier response, mechanical wear concentrated at one end of travel.

③ STABILITY — Accuracy Component

Bias drift over time

Stability (also called drift) measures whether the gauge's accuracy changes over time. A stable gauge produces the same average reading on a reference part measured today, next week, and next month. It is assessed by measuring a master part periodically and charting the averages on an Individuals (XmR) control chart. An out-of-control point signals a stability problem.

Stability = |Bias_time1 − Bias_time2|  |  Monitored via XmR chart on reference part

Cause: Thermal drift, electrical component aging, mechanical wear, contamination, re-calibration interval too long.

④ REPEATABILITY (EV) — Precision Component

Within-operator scatter — Equipment Variation

Repeatability is the variation obtained when one operator measures the same part multiple times under the same conditions. It represents the fundamental noise floor of the instrument — the best the gauge can possibly do. AIAG calls this Equipment Variation (EV). Even with a perfect operator technique, a poor gauge yields high repeatability error.

σ²_repeatability = MS_repeatability (ANOVA)  |  EV = R̄ × K₁ (Avg & Range)

Reduces with: Gauge overhaul, reducing environmental noise, better fixturing, increased resolution. This is the component that can ONLY be improved by instrument upgrade.

⑤ REPRODUCIBILITY (AV) — Precision Component

Between-operator scatter — Appraiser Variation

Reproducibility is the variation in measurement averages obtained by different operators measuring the same part with the same gauge. It captures differences in technique, fixture loading, data reading habits, and environmental sensitivity. AIAG calls this Appraiser Variation (AV). High AV tells you training and procedure standardisation is the priority — not a new gauge.

σ²_reproducibility = (MS_operator − MS_repeatability) / (n×p)  |  GR&R = √(EV² + AV²)

Reduces with: Operator training, written measurement procedures (SOP), better fixtures, fixture gauging to remove human positioning variation.

💡

AIAG Priority Rule: Always resolve accuracy problems (Bias → Linearity → Stability) before running a GR&R study. A biased or drifting gauge will corrupt your GR&R data. Recalibrate first, then study precision.

Three Methods to Quantify GR&R (Precision)

📏
Average & Range
Manual calculation using ranges. Easy but uses std dev not variance — percentages don't add to 100%. Not recommended by AIAG or Wheeler.
📊
ANOVA
Uses variance components. Detects operator×part interaction. AIAG-preferred. % of total variance sums to 100%.
🎯
EMP (Wheeler)
Evaluating the Measurement Process. Uses control charts + intraclass correlation ρ. Classifies gauge as 1st–4th Class Monitor.

See the GR&R — 3 Methods and EMP Method tabs for full worked examples using the AIAG 4th Edition reference dataset (3 operators × 10 parts × 3 trials).

The Fundamental MSA Equation

Every measurement you take is the sum of two things: what the process actually produced, and the noise your gauge added. MSA separates them.

📊 Variance Decomposition — The Core MSA Identity
σ²_observed What you see = σ²_actual True process + σ²_MSA Gauge error → GRR inflates observed variation. A bad gauge makes a capable process look incapable.
Total Observed Variance
σ²obs = σ²actual + σ²GRR
%GRR (% of study)
%GRR = 100 × σGRR / σobs
ndc — Distinct Categories
ndc = 1.41 × (σp / σGRR)
Must be ≥ 5 for adequate discrimination

S.W.I.P.E. — The Five Error Sources (AIAG 4th Ed.)

S
Standard
Reference value, NIST traceability chain, master calibration. An operational definition: same meaning to supplier and customer, yesterday and today.
W
Workpiece
Part geometry, surface finish, within-part variation. If the wrong variable is measured, no level of precision helps.
I
Instrument
Gage design, discrimination, maintenance. The 10-to-1 rule: discrimination must be ≤ 1/10 of process variation (not tolerance).
P
Person
Appraiser technique, training, skill. The recommended study for manual instruments and product/process qualification.
E
Environment
Temperature, humidity, vibration, cleanliness. Most common study source for highly automated measurement systems.

AIAG Mandatory Sequence — Never Skip

① Stability drift over time? ② Linearity bias vary by size? ③ Bias constant offset? ④ GR&R Study only if ①②③ pass

Discrimination — The 10-to-1 Rule

The 4th Edition updated this rule: instrument discrimination must be at most 1/10 of the process variation (σ × 6), not 1/10 of the tolerance. This reflects the philosophy of process-focused quality — the process, not the spec, drives measurement requirements.

ndcAbilityUse case
1Go/no-go onlyCannot distinguish values. Control only if large Cp and flat loss function.
Coarse estimationSemi-variable control only. Cannot reliably estimate process parameters.
≥ 5AdequateCan be used with variables control charts. AIAG minimum requirement.
≥ 10ExcellentFull analytical resolution. No discrimination concerns.
⚠️

Deming's Funnel / Tampering Warning (AIAG 4th Ed. Ch. I-B): A measurement system with large variation causes operators to adjust processes that don't need adjustment. Autocompensation that adjusts by the last result (Rule 2) adds variation — the exact opposite of its intent. Never adjust a stable process based on a single measurement.

🔑 Key Definitions (AIAG 4th Ed.)

  • Bias

    Difference between observed average and reference value. Systematic error. Assessed by t-test: H₀: bias=0 at α=0.05.

  • Stability (Drift)

    Change in bias over time. Tracked with X̄&R control charts on a reference part. Must be confirmed FIRST.

  • Linearity

    Change in bias over the operating range. Regression: slope=0 (H₀) tested at α=0.05. 5 parts covering full range.

  • Repeatability (EV)

    One appraiser, same part, same gage. Equipment Variation. Within-system error.

  • Reproducibility (AV)

    Different appraisers, same gage, same part. Appraiser Variation. Between-system error.

  • GR&R

    GRR² = EV² + AV². The combined measurement system capability estimate.

  • Measurement Uncertainty

    Different from MSA. MSA = understand sources. Uncertainty = range expected to contain true value. True = Observed ± U.

Bias Study — Independent Sample Method

Tests H₀: bias = 0. The calculated average bias is evaluated to determine if it could be due to random sampling variation — or if there is a true systematic offset that needs recalibration.

Step-by-Step Procedure

  • 1

    Establish Reference Value

    Send part to metrology lab or measure n≥10 times with higher-order instrument. Average = reference value. Choose a part near mid-range of production variation.

  • 2

    Collect Measurements

    Measure the same part n≥10 times under normal conditions by the lead operator.

  • 3

    Check Repeatability First

    %EV = 100[σ_r / TV]. If %EV is large, fix repeatability before continuing — bias test assumes acceptable repeatability.

  • 4

    Compute t-statistic

    t = bias / σ_b where σ_b = σ_r / √n. Reject H₀ if |t| > t(α/2, n−1). Default α = 0.05.

  • 5

    Check CI Contains Zero

    bias ± t(0.025, n−1) × σ_b. If zero is within CI → bias is acceptable.

Worked Example — AIAG MSA 4th Ed. (p.90–91)

📋

Reference value = 6.00. n = 15 readings by lead operator. Expected process variation (σ) = 2.5.

📊 AIAG Bias Study Data — 15 Readings (Reference = 6.00)
5.8
−0.2
5.7
−0.3
5.9
−0.1
5.9
−0.1
6.0
0.0
6.1
+0.1
6.0
0.0
6.1
+0.1
6.4
+0.4
6.3
+0.3
6.0
0.0
6.1
+0.1
6.2
+0.2
5.6
−0.4
6.0
0.0
AIAG 4th Ed. Bias Analysis — Table III-B 2
Inputs
Reference = 6.00
x̄ = 5.772 (15 readings)
Bias = 5.772 − 6.00 = −0.228
Significance Test
σrepeatability = 0.2967
t = Bias / (σ/√n) = −2.977
tcrit(14df, α=0.05) = 2.145
|t| > tcritBias is significant

Result from AIAG 4th Ed.: The bias is statistically acceptable. Zero falls within the 95% CI of (−0.1107, +0.1241). The measurement system can proceed to GR&R study.

Common Causes of Non-Zero Bias

  • Instrument needs calibration (most common)
  • Worn instrument, equipment, or fixture
  • Worn or damaged master; error in master
  • Instrument made to wrong dimension
  • Instrument measuring the wrong characteristic
  • Instrument correction algorithm incorrect

📋 Bias Study Summary

ParameterValue
Reference Value6.000
X̄ (15 readings)6.067
Bias
σ_r (repeatability)0.2120
%EV8.5%
t_stat1.224
t_critical (α=0.05)2.145
95% CI lower−0.1107
95% CI upper+0.1241
Zero in CI?YES ✓
VerdictACCEPTABLE

Linearity Study

Linearity = bias that changes with the size of the part being measured. A gage may be perfectly accurate at one point in its range and badly biased at another. Tests if the slope of bias vs. reference value equals zero.

How to Conduct (AIAG 4th Ed.)

  • 1

    Select 5 Parts Across Full Range

    Choose g ≥ 5 parts whose measurements, due to process variation, cover the full operating range of the gage.

  • 2

    Establish Reference Values

    Have each part measured by layout inspection. Confirm the gage's operating range is fully covered.

  • 3

    Measure m ≥ 10 Times Each

    One operator, same gage, random order (to prevent recall bias).

  • 4

    Regression Analysis

    Fit bias = a × reference + b. Test H₀: a=0 (no linearity) AND H₀: b=0 (no constant bias). Both must pass.

Worked Example — AIAG MSA 4th Ed. (Table III-B 4)

PartRef. ValueAvg BiasVerdict
12.00+0.507Large positive bias
24.00
36.00+0.083Near zero
48.00
510.00−0.614Large negative bias
AIAG 4th Ed. Linearity Analysis — Table III-B 5
Regression
Bias = a + b × Ref
b (slope) = −0.1429
a (intercept) = 0.8373
Significance
tslope = −3.116, p < 0.05
Linearity is significant
R² = 0.3266
🔴

AIAG conclusion: This measurement system has a linearity problem. The bias starts large and positive at small part sizes and switches to large negative at large sizes. The gage must be recalibrated across its full range before use. Cannot be used for product/process analysis in this condition.

Graphical Pass/Fail Rule

Plot bias vs reference value with best-fit line and confidence bands. For linearity to be acceptable, the "bias = 0" horizontal line must lie entirely within the confidence bands of the fitted regression line. If the zero line exits the bands at any point — linearity problem exists regardless of numerical results.

📊 Linearity vs Bias at a Glance

Reference value → Bias bias=0 Constant bias (OK for calib) Linearity error (must fix)
📌

Constant bias can be corrected by recalibration. A linearity error requires hardware or software modification across the full operating range.

Stability Study — Change in Bias Over Time

A stable gauge gives the same bias today as it did last month. Stability must be confirmed with X̄&R control charts on a reference part before any GR&R study begins — an unstable system produces meaningless GR&R results.

Procedure

  • 1

    Select Reference Part

    Near mid-range of production variation. Establish reference value from lab/higher-order system. May want masters at low, mid, and high range — separate charts for each.

  • 2

    Periodic Measurement

    Measure the reference part n=3–5 times per period. Weekly or daily, depending on expected drift rate. Plan ≥20 subgroups before final assessment.

  • 3

    X̄&R Control Charts

    Plot and analyze. Look for: trends, shifts, out-of-control signals, cycles. No specific %Stability index — analysis is through control chart interpretation.

  • 4

    Pass / Fail

    Stable = no OOC signals, no trends. Unstable = any OOC signal, trend, or systematic drift. Do not proceed to GR&R until stable.

AIAG Example — Stability Study Data (Figure III-B 1)

From AIAG MSA 4th Ed., Figure III-B 1 — Stability Study
Reference Value = 6.00
Control limits: UCL = 6.11  |  LCL = 5.72
UCLR = 0.73  |  LCLR = 0
All points within limits → measurement system stable

Stability vs Other MSA Properties

PropertyVaries withStudy designChart typeOrder
StabilityTIMESame part, time changesX̄&R over time① First
LinearityRANGEDifferent parts, same timeRegression plot② Second
BiasSame part, single sessionHistogram + CI③ Third
GR&RMultiple parts, appraisers, trialsX̄&R or ANOVA④ Last
⚠️

No specific %Stability threshold exists in AIAG 4th Ed. The manual explicitly states: "Other than normal control chart analyses, there is no specific numerical analysis or index for stability." Pass/fail is entirely based on control chart interpretation. The torque wrench example from our other module uses a calculated percentage — that is a customer-specific metric, not an AIAG standard.

🔧 Why Stability First?

If the bias is changing over time while you conduct a GR&R study, your results are meaningless. The study will reflect a snapshot of a moving target — not the true long-term measurement system capability.

🚨

GR&R on an unstable system = wasted effort. Calibrate, investigate, and restore stability first.

Possible causes of instability
  • Wear in measurement equipment
  • Damaged or worn standard/master
  • Temperature / humidity cycling
  • Electronic drift in sensors
  • Spring fatigue (torque wrenches)
  • Contamination or lubricant buildup

GR&R Study Methods — X̄-R Method

The X̄-R method (Average and Range) is the automotive industry standard: 3 appraisers × 10 parts × 2–3 trials, randomised order. Cannot detect appraiser-by-part interaction, but well understood and widely accepted for PPAP.

Complete AIAG Example (Table III-B 15/16)

GRR Study Setup — 3 Appraisers × 10 Parts × 3 Trials
Key ANOVA Results
Parts F = 128.93 (p < 0.001)
Appraisers F = 0.424 (p = 0.661)
Interaction F = 0.434 (p = 0.850)
Variance Components
σ²repeatability = 0.04007
σ²reproducibility = 0.00456
σ²GRR = 0.04463
σ²parts = 0.17020

GRR Acceptance Zones

✓ <10%
⚠ 10–30%
✗ >30% Unacceptable

Three Accepted Methods

Uses ranges from pairs of measurements. Provides only combined GRR — cannot separate EV from AV. Not acceptable for PPAP submission. Used for quick initial screening to see if a formal study is warranted.

GRR = R̄ / d₂* where d₂* depends on sample size and number of subgroups.

3 appraisers × 10 parts × 2–3 trials, random order. Uses control chart constants K1, K2, K3 to separate EV and AV. Cannot estimate appraiser-by-part interaction. Most common in PPAP packages.

Most statistically powerful. Handles any experimental setup. Detects appraiser-by-part interaction — a source X̄-R method misses. Decomposes: Parts, Appraisers, Interaction, Equipment. AIAG recommends this method when a computer is available.

What GRR Diagnostics Tell You

FindingRoot CauseAction
EV large vs AVInstrument problemMaintenance, redesign, fix clamping
AV large vs EVAppraiser technique differsRetrain, clarify procedure, add fixture
Interaction significantAppraisers handle parts differentlyStandardise measurement procedure
ndc = 1 or 2Poor discriminationUpgrade gauge resolution

📋 Study Results (AIAG Example)

SourceStdDev%TV
EV (Repeat.)0.202
AV (Reprod.)0.230
GRR Total
PV (Parts)1.10496.4%
TV1.086100%
ndc
⚠️

At 26.7% GRR, this system is in the "may be acceptable" zone. AIAG says decision should be based on application importance and cost.

ANOVA Method — Same Data, Better Results

ANOVA on the same 3×10×3 dataset detects whether appraiser-by-part interaction is significant — something the X̄-R method simply cannot see. When interaction is non-significant, results are pooled into the equipment term.

The ANOVA Table (AIAG Table III-B 7)

SourceDFSSMSFSignificant?
Appraiser23.16731.5836334.44Yes (α=0.05)
Parts988.36199.81799213.52Yes (α=0.05)
Appraiser×Part180.35900.019940.434NO — pooled
Equipment602.75890.04598
Total8994.6471
ANOVA Pooling Decision — Interaction Non-Significant
Interaction F = 0.434 < Fcritical → pool with Equipment
Pooled MSequipment = (SSinteraction + SSrepeatability) / (dfint + dfrep)
= (0.7783 + 7.2129) / (18 + 60) = 0.10247
σ²repeatability = MSequipment = 0.04007

ANOVA vs X̄-R: Side-by-Side

MethodEVAVGRR%GRRInteraction
X̄-R Method0.2020.2300.306
ANOVA0.2000.2270.3020 (not significant)

Results are very close — this is expected when interaction is non-significant. ANOVA gives slightly more accurate estimates due to better partitioning. The key ANOVA advantage is detecting the interaction term.

💡

When does interaction matter? If the interaction term were significant (parallel lines on interaction plot = no interaction; crossing lines = interaction), it would indicate different appraisers handle different parts inconsistently — a training or fixture problem specific to certain part geometries.

📊 Graphical Tools — ANOVA

  • Interaction Plot

    Appraiser avg per part vs part number. Parallel lines = no interaction. Crossing lines = interaction present.

  • Error Charts

    Individual deviations from reference. Appraiser A: positive bias. Appraiser C: negative bias (from AIAG example).

  • Whiskers Chart

    High/low/average per part per appraiser. Reveals inconsistent appraisers across different part sizes.

  • Residual Plot

    Fitted vs residual values. Check for randomness — any pattern suggests model inadequacy.

EMP Method — Evaluating the Measurement Process

The EMP methodology, developed by Dr. Donald J. Wheeler, goes beyond a simple pass/fail percentage. It uses control charts to validate the study, computes variance components (not standard deviations), and classifies your measurement system as a First, Second, Third, or Fourth Class Monitor — giving you actionable intelligence about what the gauge can actually do in production.

📖

Source: All three GR&R methods below use the AIAG 4th Edition reference dataset — 3 operators (A, B, C) × 10 parts × 3 trials each = 90 measurements total. This allows direct comparison of methods on identical data.

The AIAG Reference Dataset (Table 1)

Op.TrialP1P2P3P4P5P6P7P8P9P10
A10.29−0.561.340.47−0.800.020.59−0.312.26−1.36
A20.41−0.681.170.50−0.92−0.110.75−0.201.99−1.25
A30.64−0.581.270.64−0.84−0.210.66−0.172.01−1.31
B10.08−0.471.190.01−0.56−0.200.47−0.631.80−1.68
B20.25−1.220.941.03−1.200.220.550.082.12−1.62
B30.07−0.681.340.20−1.280.060.83−0.342.19−1.50
C10.04−1.380.880.14−1.46−0.290.02−0.461.77−1.49
C2−0.11−1.131.090.20−1.07−0.670.01−0.561.45−1.77
C3−0.15−0.960.670.11−1.45−0.490.21−0.491.87−2.16

EMP Variance Component Formulas

Like ANOVA, EMP works in variances (not standard deviations). Subgroups are each operator×part combination (e.g., A-Part1 = {0.29, 0.41, 0.64}). The average range R̄ drives all calculations.

X̄-R Method — Step by Step
Step 1 — Repeatability (EV)
EV = R̄ × K₁
R̄ = 0.4267 (avg range)
K₁ = 0.5908 (3 trials)
EV = 0.2520
Step 2 — Reproducibility (AV)
AV = √(x̄diff² × K₂² − EV²/nr)
diff = 0.2533, K₂ = 0.7071
AV = 0.1715

EMP Variance Results (Table 6)

ComponentVariance% of Total
Repeatability0.04073.1%
Reproducibility0.05314.1%
Product (Part-to-Part)1.21692.8%
Total1.310100.0%

The Intraclass Correlation Coefficient (ρ)

This is EMP's key metric — the ratio of part variance to total variance. It tells you what fraction of observed variation is real product signal vs. gauge noise.

Intraclass Correlation Coefficient ρ
ρ = σ²p / σ²x = 1 − (σ²GRR / σ²total)
ρ close to 1 → most variance is from real part differences (good). ρ close to 0 → measurement system dominates (bad).

Wheeler's Four Monitor Classes — Interpreting ρ

ρ RangeClassSignal ReductionChance Detect ±3σ ShiftTrack Process?%R&R / AIAG
0.8 – 1.0 First Class ★ <10% >99% (Rule 1) Up to Cp₈₀ 0–20% · Acceptable
10–30% >88% (Rule 1) 20–50% · Marginal
0.2 – 0.5 Third Class 30–55% >91% (Rules 1–4) Up to Cp₂₀ 50–80% · Unacceptable
0.0 – 0.2 Fourth Class >55% Rapidly Vanishing Unable to Track 80–100% · Unacceptable

Adapted from EMP III: Evaluating the Measurement System, Donald J. Wheeler, SPC Press, 2006.

Our example result: ρ = 0.928 → First Class Monitor. This means less than 10% reduction in process signal, better than 99% chance of detecting a ±3σ shift with Rule 1, and the measurement system can track process improvements all the way to Cp₈₀. The gauge is excellent for SPC use.

All Three Methods Side-by-Side (Same Data)

Source Average & Range ANOVA EMP
Std Dev%TV (σ) Variance%TV (σ²) Variance%TV (σ²)
Repeatability0.20217.61%0.04003.39%0.04073.1%
Reproducibility0.23020.04%0.05154.37%0.05314.1%
R&R0.30626.68%0.09147.76%0.09387.2%
Part-to-Part1.10496.37%1.08692.24%1.21692.8%
Total1.1461.178100%1.310100%
⚠️

Why the Average & Range method is misleading: Standard deviations are not additive (σ_total ≠ σ_parts + σ_ms), so the % column doesn't sum to 100% and is mathematically incorrect for decision-making. The 26.68% R&R figure from the Avg & Range method on this same data looks "marginal" under AIAG criteria, while ANOVA and EMP correctly show 7–8% — clearly acceptable. Bottom line: use ANOVA or EMP.

Which Method to Use?

Avg & Range

Only use if hand calculations are required with no software. Always convert to variance before interpreting. Not recommended.

ANOVA

AIAG-preferred. Detects operator×part interaction. Best for automated environments and PPAP submissions. Use this by default.

EMP

Adds control chart validation and the Monitor Class framework. Use when you want to understand what the gauge can actually do for process control.

Attribute Measurement System Analysis

Attribute gauges produce finite categories (pass/fail, good/bad, or colour grades). Standard GR&R methods don't apply — instead AIAG uses Cohen's Kappa for agreement and Effectiveness for decision accuracy.

Cross-Tabulation and Cohen's Kappa

Kappa measures inter-rater agreement beyond what chance alone would produce.

Cohen's Kappa Formula
κ = (po − pe) / (1 − pe)
po = observed agreement  |  pe = expected by chance
Interpretation
κ ≥ 0.9 → Excellent
0.7 ≤ κ < 0.9 → Acceptable
κ < 0.7 → Inadequate — investigate

AIAG Attribute MSA Example (Table III-C 3)

PairKappaVerdict
Appraiser A vs B0.86Good agreement
Appraiser B vs C0.79Good agreement
Appraiser A vs C0.78Good agreement
Appraiserκ vs ReferenceEffectivenessMiss RateFalse AlarmVerdict
A0.884.9%
B0.9290%2.0%
C80%12.5%Unacceptable

Effectiveness Acceptance Criteria (Table III-C 6)

DecisionEffectivenessMiss RateFalse Alarm Rate
Acceptable≥ 90%≤ 2%≤ 5%
Unacceptable< 80%> 5%> 10%
📌

Important AIAG caution: A 90% agreement rate on a process with Pp=1.0 doesn't mean 90% of bad parts are caught. Bayes' Theorem must be applied — the probability a rejected part is truly bad depends on the underlying defect rate. At very low defect rates, most "rejected" parts are actually false alarms.

Signal Detection Approach (for %GRR)

When variable reference data is available, the gray zone width between the last universally-accepted and first universally-rejected part estimates 6σ_GRR:

Boundary Analysis — Gauge Discrimination
dUSL = last-accepted-by-all → first-rejected-by-all (at USL)
dLSL = same calculation at LSL
d = average of dUSL and dLSL
GRRboundary = d / 5.15  (5.15σ = 99% spread)

📊 Attribute MSA Summary

📌

No single appraiser in the AIAG example met ALL three criteria simultaneously. This is the key finding — a system-level decision is needed.

  • Kappa > 0.75

    All pairs met this. Appraisers agree with each other well.

  • Effectiveness

    Only B reached ≥90%. A and C are marginal/unacceptable.

  • Miss Rate

    All three had 6.3%+ miss rate, exceeding the ≤2% threshold. Training needed.

How GRR Distorts Your Cp — AIAG Appendix B

The most important and most overlooked MSA insight: your observed Cp is always lower than your actual process Cp because measurement error inflates the observed variation. Appendix B of AIAG MSA 4th Ed. gives the exact formula.

AIAG Appendix B — Exact Relationships
Process-variation basis
%EV = 100 × (EV / TV)
%AV = 100 × (AV / TV)
%GRR = 100 × (GRR / TV)
%PV = 100 × (PV / TV)
Tolerance basis
%EV = 100 × (EV / Tol)
%AV = 100 × (AV / Tol)
%GRR = 100 × (GRR / Tol)
TV = √(GRR² + PV²)

What This Means in Practice

A high GRR makes your process capability look worse than it really is. This has real consequences: a process may be denied production approval because of its measurement system, not because of the process itself.

🚨

Critical insight: At GRR=70% with Cp_obs=1.30, the actual process Cp is still only 1.04 — barely capable. This means high GRR doesn't just disguise a capable process — it may be masking a barely capable one. Always investigate GRR before concluding a process is incapable.

📊 Appendix B Table — Observed vs Actual Cp

Actual Cp = 1.30, GRR varies (process-based)

GRR %Cp_obs (process)Cp_obs (tolerance)
10%1.291.29
20%1.271.26
60%1.040.81
70%0.930.54
90%0.57never

At GRR=50%, tolerance-based Cp_obs drops to 0.99 — looks incapable even though actual Cp=1.30!

📊 Observed Cp vs Actual Cp (Actual=1.30) — as GRR increases
GRR 10%
1.29 Nearly unaffected
GRR 30%
1.24 −5% loss
GRR 50%
1.13 −13% loss
GRR 70%
0.93 Below 1.0 ✗

Measurement Tools, Destructive & Non-Destructive Testing

Before you can analyse measurement system variation, you need to select the right measurement tool and understand its capabilities. The Rule of 10 governs tool selection; destructive and NDT methods determine what kind of testing is possible.

Measurement Tools — Precision Hierarchy

ToolLeast count / ResolutionPrincipleTypical use
Scale / Tape Measure1 mm or 0.5 mmDirect linear measurement against graduated scaleRough dimensions, layout
Vernier Caliper0.1 mm or 0.05 mmMain scale + vernier scale alignmentOD/ID/depth/step measurements
Micrometer0.01 mmScrew thread advancement per revolutionShaft/bore diameters, wall thickness
Gage Blocks (Slip Gauges)0.001 mm (1 μm)Precision ground blocks — stacked by "wringing" (molecular adhesion up to 330 N pull force)Calibration reference, setting instruments
Optical ComparatorDepends on magnificationMagnified silhouette projected on screen — dimensions measured against prescribed limitsComplex profiles, thread forms, gear teeth
💡

Rule of 10 (10:1 Rule): The measuring instrument resolution should divide the tolerance into at least 10 parts. Example: tolerance = ±0.05mm (range = 0.10mm) → minimum instrument resolution = 0.01mm → Digital Vernier (0.01mm) is acceptable; tape measure (1mm) is not. Calibration instruments should be 10× better than the measuring instrument.

Destructive Testing

Destructive tests damage or destroy the test piece. Used when the test must measure failure — cannot be used for 100% inspection. Drives the need for acceptance sampling.

🔩 Tensile Test

Stress-Strain curve analysis. Pulls the specimen to failure.

  • Stress = Force / Area (Pa = N/m²)
  • Strain = ΔLength / Length (unitless)
  • Measures: UTS, yield strength, elongation, Young's modulus
  • Curve shapes: ductile steel, brittle (concrete/carbon fibre), non-ferrous
💥 Charpy Impact Test (V-notch)

Measures notch toughness — ability to absorb energy during fracture. A pendulum swings and strikes a notched specimen.

  • Result: energy absorbed (Joules)
  • Critical for low-temperature applications
  • Identifies brittle-ductile transition temperature
🔄 Fatigue Test

Applies cyclic loading until failure. Most engineering failures are fatigue-related.

  • Determines S-N curve (stress vs cycles to failure)
  • Identifies endurance limit (some steels)
  • Critical for rotating machinery, aircraft structures

Non-Destructive Testing (NDT)

NDT methods inspect materials and components without causing damage — enabling 100% inspection for critical items. Each method has specific capabilities and limitations.

MethodPrincipleDetectsApplicable materials
Radiography (X-ray / Gamma)Radiation passes through material; defects absorb differently and show on film/detectorInternal voids, porosity, inclusions, weld defectsMost materials — metals, composites, castings
Ultrasonic Testing (UT)Sound waves >20 kHz transmitted into material; reflections from defects detectedInternal defects, thickness measurement, delaminationsMetals, composites, welds
Magnetic Particle (MT)Magnetic field applied; field leaks at surface/near-surface defects; magnetic particles accumulateSurface and near-surface cracksFerromagnetic materials ONLY (steel, iron)
Liquid Penetrant (PT)Dye penetrant drawn into surface cracks by capillary action; developer reveals defectsSurface-breaking defects onlyAny material — magnetic AND non-magnetic
Hardness TestingIndenter pressed into surface; hardness = resistance to indentation (Vickers HV, Brinell HB, Rockwell HR)Material hardness, heat treatment verificationMost solid materials

Crossed vs Nested GR&R Studies

Crossed GR&R Study

Each operator measures every part. The parts are re-measured multiple times. Enables separation of EV (repeatability) and AV (reproducibility) components.

  • ✓ Standard AIAG GR&R method
  • ✓ Used for non-destructive measurements
  • ✓ Provides separate EV, AV, and interaction estimates
  • ✓ Typical design: 3 operators × 10 parts × 2 replicates
Nested GR&R Study

Each operator measures a different set of parts — typically because the measurement destroys the part. Parts are nested within operators; cannot be measured by more than one operator.

  • ✓ Used for destructive tests (tensile, hardness, chemical)
  • ✓ Cannot separate repeatability from part-to-part variation within operator
  • ⚠️ Reproducibility is confounded with part variation
  • ✓ Requires more parts than crossed design
Quality Philosophy

Quality Philosophy

The foundational reference for quality engineering. Covers the evolution of quality, the philosophies of every major quality pioneer, continuous improvement frameworks, strategic planning, facilitation tools, customer relations, supplier management, and barriers to quality improvement.

Evolution of Quality & the Philosophies That Shaped It

Quality management evolved from pure inspection through statistical control, quality assurance, and total quality management into today's business excellence frameworks. Each pioneer contributed a distinct, testable philosophy that forms the foundation of modern quality engineering.

📊 Evolution of Quality — Key Milestones
Inspection → QC → SPC → QA → TQM → Business Excellence 1901 Standardization 1924 Shewhart Charts 1950s Deming/Juran → Japan 1960s Ishikawa / Taguchi 1979 Crosby — Quality is Free 1987 Motorola Six Sigma 1990s+ TQM → Business Excellence

W. Edwards Deming — 14 Points & System of Profound Knowledge

Deming taught that 85–94% of quality problems are caused by the system itself — not the workers. His message to Japan in the 1950s transformed their manufacturing. His framework rests on four areas of Profound Knowledge: appreciation for a system, knowledge about variation, theory of knowledge, and psychology.

#PointCore ideaQuality engineering implication
1Create Constancy of PurposeLong-term commitment to improvement; customer focus; invest in innovation & trainingDrives design for reliability, not just today's spec compliance
2Adopt the New PhilosophyManagement must lead change; be prepared for transformationQuality is not a department — it is a system responsibility
3Cease Dependence on Mass InspectionBuild quality into the process; inspection is too late & too costlyPrevention > detection; PFMEA before production, not rework after
4End Lowest-Price PurchasingMove toward single suppliers on long-term trust; multiple suppliers = more variabilitySupplier qualification programs, approved vendor lists
5Improve Constantly and ForeverPDCA; reduce variation; engage all employeesSPC, DMAIC, continuous capability improvement
6Institute Training on the JobPeople must know how to do their job; training includes tools and improvement methodsCalibration training, GR&R awareness, SPC chart reading
7Institute LeadershipSupervisors are coaches, not police; understand processesProcess owners empowered to stop the line on defects
8Drive Out FearMutual respect; workers feel valued and can flag problems freelyOpen reporting of defects; psychological safety for quality escalation
9Break Down BarriersCross-functional teams; internal customer concept; common visionAPQP teams, design-manufacturing-quality integration
10Eliminate Slogans & PostersSlogans assume people cause problems — the system doesFix the process, not the person; root cause analysis not blame
11Eliminate Numerical QuotasQuotas without a plan are demoralising; substitute leadershipCapability targets backed by process improvement plans
12Remove Barriers to PrideAbolish annual merit rating that creates competition; recognise craftsmanshipTeam-based quality improvement rewards over individual rankings
13Institute Education & Self-ImprovementWorkers learn new skills to face future challengesStatistical literacy training; professional development
14Take Action — TransformTransformation is everybody's job; cultural change starts at the topQuality culture deployment through management commitment
💡

Deming's Chain Reaction: Improve quality → costs decrease (less rework, fewer mistakes) → productivity improves → capture the market → stay in business → provide more jobs. The chain begins with quality, not with cost-cutting.

Joseph Juran — The Quality Trilogy & Fitness for Use

Juran defined quality as fitness for use — not conformance to specification. He emphasised top management involvement, project-by-project improvement, and the Pareto principle (vital few vs. useful many). His Quality Control Handbook (1951) remains the definitive reference.

Quality Planning

Preparing to meet quality goals. Identify customers, determine their needs, develop product/process features that respond to those needs, establish quality goals.

Quality Control

Meeting quality goals during operations. Evaluate actual performance, compare to goals, act on the difference. The ongoing process of holding the gains — SPC, inspection, audits.

Quality Improvement

Breaking through to unprecedented levels of performance. Project-by-project — select the project, organise the team, diagnose causes, implement remedies, hold the gains.

#Juran's 10 Steps to Quality Improvement
1Build awareness of the need and opportunity for improvement
2Set goals for improvement
3Organise to reach the goals (establish a quality council, identify problems, select projects)
4Provide training
5Carry out projects to solve problems
6Report progress
7Give recognition
8Communicate results
9Keep score of improvements achieved
10Maintain momentum by making annual improvement part of the regular systems and processes

Philip Crosby — Four Absolutes & Quality is Free

Crosby defined quality as conformance to requirements — not goodness or elegance. His 1979 book Quality is Free argued that the cost of poor quality always exceeds the cost of preventing defects. His message to management: the system causes non-conformance, and prevention — not appraisal — is the correct system.

The Four Absolutes of Quality
  1. Definition: Quality is conformance to requirements — not elegance. Do It Right the First Time (DIRFT).
  2. System: The system of quality is prevention, not appraisal. An error that doesn't exist can't be missed.
  3. Standard: The performance standard is zero defects — a management standard, not a motivational slogan.
  4. Measurement: Quality is measured by the Price of Non-Conformance — cost of doing things wrong.
Price of Conformance vs Non-Conformance

Price of Conformance (POC): All expenses necessary to make things right. Quality functions, prevention efforts, quality education, audits.

Price of Non-Conformance (PONC): All expenses involved in doing things wrong — fixing problems, correcting orders, rework, scrap, warranty claims, customer returns.

Crosby's claim: PONC always > POC ∴ Quality is Free

Walter A. Shewhart — Father of Statistical Quality Control

Shewhart invented the control chart in 1924 at Western Electric's Hawthorne Works and introduced the PDCA (Plan-Do-Check-Act) cycle. He was the first to distinguish between common cause (chance) variation and special cause (assignable) variation — the foundational insight behind all SPC.

Key Contributions
  • 📈 Invented the control chart (1924) — X̄-R, p, c, u charts
  • 🔄 Developed the PDSA/PDCA cycle (Shewhart Cycle — later popularised by Deming)
  • 📊 Distinguished common cause (system) from special cause (assignable) variation
  • 📖 Published Economic Control of Quality of Manufactured Product (1931)
Variation Types

Common Cause (Chance): Inherent in the process. Many small, independent sources. Stable and predictable. Only the system (management) can reduce it.

Special Cause (Assignable): An identifiable, specific source outside the system. Intermittent and unpredictable. Operators and engineers can find and fix these.

Pioneer Philosophy Quick-Reference

PioneerQuality defined asPrimary frameworkKey exam trigger word
DemingReduction of variation; customer satisfaction14 Points, System of Profound Knowledge, PDCA"Common cause / special cause", "chain reaction"
JuranFitness for useQuality Trilogy (Planning, Control, Improvement), 10 Steps"Fitness for use", "project-by-project", "vital few"
CrosbyConformance to requirementsFour Absolutes, Zero Defects, PONC/POC"Conformance to requirements", "zero defects", "prevention"
ShewhartStatistical controlControl charts, PDCA cycle, common/special cause"Control chart", "assignable cause", "PDSA"
TaguchiMinimum loss to societyLoss function, robust design, parameter/tolerance design"Loss function", "nominal is best", "signal-to-noise"
IshikawaTotal quality through all employeesCause-and-effect diagram, QC circles, 7 tools"Fishbone", "cause-and-effect", "QC circles"

Continuous Improvement Frameworks

Five major CI frameworks every quality engineer needs to understand — how they relate, where they differ, and when to apply each.

Lean — Eliminate Waste, Maximise Flow

Lean originated with Ford's mass production principles (1910s) and was systematised into the Toyota Production System (TPS) in the 1950s. James Womack, Daniel Roos, and Daniel Jones documented it for the West in The Machine That Changed the World (1990). Lean identifies eight types of waste (DOWNTIME) and organises the entire enterprise around delivering value at the rate demanded by the customer.

The 5 Lean Principles
  1. Value: Specify what creates value from the customer's perspective — not the producer's.
  2. Value Stream: Map all steps in the process chain; eliminate non-value-adding steps.
  3. Flow: Make value-creating steps flow without interruption, batching, or waiting.
  4. Pull: Produce only what is needed by the customer — short-term response to demand rate (takt time).
  5. Perfection: Continuously pursue elimination of all waste; the process never ends.
Lean Benefits
  • ✓ Reduced waste (DOWNTIME: Defects, Overproduction, Waiting, Non-utilised talent, Transport, Inventory, Motion, Extra processing)
  • ✓ Improved quality and customer satisfaction
  • ✓ Reduced inventory and cycle time
  • ✓ Flexible manufacturing capability
  • ✓ Safer workplace and improved employee morale

Six Sigma — Reduce Variation to Near-Zero Defects

Motorola developed Six Sigma in 1987, raising quality standards dramatically. AlliedSignal (now Honeywell), GE, Dow Chemical, DuPont, Whirlpool, and IBM adopted it in the mid-1990s, proving its cross-industry applicability.

🎯
Know CTQs

Identify what's critical to quality from the customer's perspective

📉
Reduce Defects

Drive DPMO down; measure defects per million opportunities

Centre on Target

Minimise deviation of mean from nominal target value

Reduce Variation

Tighten standard deviation; narrow the process spread

Theory of Constraints (TOC) — Focus on the Weakest Link

Introduced by Eliyahu Goldratt in The Goal (1984). TOC holds that every system has exactly one constraint limiting overall throughput at any given time. Improving a non-constraint does not improve the system — only improving the current constraint does.

TOC StepActionKey principle
1. IdentifyFind the current constraint — the weakest link in the chainPhysical, Policy, Paradigm, or Marketplace constraints
2. ExploitSqueeze maximum performance from the constraint using existing resources — no new investment yetDon't waste constraint capacity on anything non-essential
3. SubordinateAlign all other activities to support the constraint's paceA non-constraint running faster than the constraint builds WIP, not output
4. ElevateIf the constraint persists after exploiting and subordinating, invest to break itAdd capacity, change the process, redesign
5. RepeatOnce broken, a new constraint will emerge — return to step 1Continuous improvement is never finished

Total Quality Management (TQM)

TQM is a management approach to achieving customer satisfaction through every person in the organisation working to continuously improve products, processes, and services. Unlike Six Sigma (project-focused) or Lean (waste-focused), TQM is a cultural philosophy. Most quality awards (Baldrige, EFQM, Deming Prize) are grounded in TQM principles.

TQM Core Principles
  • 🎯 Customer focus — internal and external customers
  • 🔄 Continuous improvement (Kaizen) — forever and ever
  • 👥 Total employee involvement — every person owns quality
  • 📊 Process approach — manage activities as interconnected processes
  • 🤝 Supplier partnerships — extend quality into the supply chain
CI Framework Comparison
FrameworkPrimary focusMethodology
LeanWaste elimination, flowValue stream mapping, 5S, Kaizen
Six SigmaVariation reduction, defectsDMAIC, statistical analysis
TOCThroughput, bottleneck5 focusing steps, drum-buffer-rope
TQMCulture, customer satisfactionQuality awards, customer surveys
SPCProcess stability and capabilityControl charts, capability studies

Strategic Planning, Deployment & Information Systems

Strategic planning aligns the quality function with organisational goals — covering planning frameworks, deployment tools, and performance measurement including the Balanced Scorecard, leading vs lagging indicators, and project management techniques.

Strategic Planning — VMOSA Framework

V
Vision

The dream — what the organisation aspires to become in the long term

M
Mission

What the organisation does and why it exists — the purpose statement

O
Objectives

How much of what — specific, measurable goals to achieve the mission

S
Strategies

How — broad approaches used to achieve each objective

A
Action Plans

Who will do what by when — the specific tasks assigned to specific people

Balanced Scorecard — Kaplan & Norton

Developed by Robert Kaplan and David Norton, the Balanced Scorecard translates strategy into four perspectives of performance measurement — preventing over-reliance on financial metrics alone. Quality professionals use it to frame the value of quality investments in language executives understand.

💰 Financial Perspective

How do we look to shareholders? Revenue growth, profitability, cost reduction, ROI. Quality metric: Cost of Poor Quality (COPQ) as % of sales revenue.

🎯 Customer Perspective

How do customers see us? Satisfaction scores, NPS, on-time delivery, defect rates in the field, warranty claims per unit.

⚙️ Internal Processes Perspective

What must we excel at internally? Process yield, Cpk levels, first-pass yield, defect rate, audit outcomes, cycle time.

📚 Learning & Growth Perspective

Can we continue to improve and create value? Training hours, certifications (ASQ, IASSC), employee engagement, suggestion rate, new quality tools adopted.

Leading vs Lagging Indicators

TypeDefinitionCharacteristicsQuality examples
Lagging Indicators Post-event (output) measures — what has already happened Easy to measure, historically accurate, but cannot prevent what already occurred DPMO, defect rate, warranty returns, customer complaints, scrap cost, Cpk
Leading Indicators Predictive (input) measures — early signals of future performance Difficult to identify and validate; harder to measure; not guaranteed predictors Training hours, PFMEA completion %, process audit scores, SPC chart compliance, supplier qualification status
💡

Best practice: Use a mix of both. Lagging indicators tell you what happened; leading indicators tell you where you're heading. A dashboard with only lagging metrics is a rearview mirror — add leading metrics to steer the process proactively.

Stakeholder Identification & Analysis

ISO 9001:2015 clause 4.2 requires organisations to determine interested parties and their requirements. Stakeholder analysis maps each party by their level of interest and power/influence, then defines the appropriate engagement strategy.

Stakeholder Power-Interest Grid
KEY PLAYERS
High power, High interest
→ Manage closely
LATENTS
High power, Low interest
→ Keep satisfied
DEFENDERS
Low power, High interest
→ Keep informed
APATHETICS
Low power, Low interest
→ Monitor
Stakeholder Examples
  • 👔 Internal: Owners, managers, employees, partners
  • 🏭 Supply chain: Suppliers, sub-tier suppliers
  • 🛒 Market: Customers, end users
  • 🏛️ External: Regulators, industry associations, media, local community
ISO 9001:2015 §4.2 — Monitor and review stakeholder requirements

Quality Information System (QIS)

A QIS is the data-centric infrastructure of the quality management function — the systems used to collect, store, analyse, and report quality-related data across the organisation.

Data Captured by a QIS
  • 📋 Design reviews and change records
  • 🔍 Audit findings and corrective actions
  • ⚠️ Non-conformances and dispositions
  • 🔧 Repairs, returns, warranty claims
  • 😊 Customer satisfaction surveys
  • 📊 Test reports, certificates, performance data
QIS Benefits
  • ✓ Identifies priorities for improvement investment
  • ✓ Tracks performance of quality initiatives and ROI
  • ✓ Enables competitor performance benchmarking
  • ✓ Breaks silos — all departments access the same quality data
  • ✓ Supports fact-based decision making at every level

Team Dynamics, Leadership & Facilitation Tools

Effective quality improvement requires high-performing teams — covering team types, the Tuckman model of team development, team roles, and the facilitation tools used in quality projects.

Team Types

Team TypeDescriptionQuality context
FunctionalMembers from same department/function with similar expertiseQuality lab team, inspection team, calibration group
Cross-FunctionalMembers from multiple departments working on a shared goalAPQP team, PFMEA team, 8D corrective action team
VirtualGeographically dispersed team relying on technology to collaborateGlobal supplier quality teams, multi-site audit teams
Self-ManagedTeam with authority to set own goals, methods, and schedulesAutonomous production cells with built-in quality responsibility
Quality CirclesVoluntary groups of front-line workers meeting regularly to identify and solve quality problems — introduced by IshikawaShop-floor improvement groups, Kaizen circles

Tuckman Model of Team Development

Bruce Tuckman's five-stage model (1965, extended 1977) describes the predictable journey teams undergo from formation to high performance. Understanding which stage a team is in allows a leader or facilitator to apply the right intervention.

👋
FORMING

Members first come together; polite, uncertain about roles and goals; depend on leader for direction

STORMING

Conflict emerges; teamwork harder than expected; power struggles; important not to suppress but navigate

🤝
NORMING

Team moves beyond storming; norms established; collaboration improves; roles clarified

🚀
PERFORMING

High performance; team is self-directing; interdependent; focused on goals

👋
ADJOURNING

Task complete; team disbands; celebrate achievements, capture lessons learned

Team Roles — Leader, Facilitator, Coach, Members

RolePrimary responsibilitiesKey distinction
LeaderProvides direction; clarifies roles; establishes ground rules; ensures goal completion; conducts meetings; assigns tasksHas formal authority and accountability for the team's output
FacilitatorHelps the team understand its objective and how to achieve it; guides process without dictating contentNo formal authority to make decisions — leads by process, not position
CoachOne-to-one support after training; first point of contact for issues; uses GROW modelDevelops individuals; not the same as a trainer (one-to-many)
MembersParticipate actively in meetings; perform assigned tasks; contribute ideas in brainstormingOwn the work; team's subject matter experts
💡

GROW Coaching Model: Goal — what does the team/individual want to achieve? Reality — what is the current state and what challenges exist? Obstacles — what is stopping progress? Way forward — what specific steps will be taken and by when?

Facilitation Tools

🧠 Brainstorming

Group or individual technique to generate ideas spontaneously for a specific problem. Quantity over quality — defer all judgment during generation.

Four Rules:
  • 1. Focus on quantity — more ideas = more options
  • 2. Withhold criticism — no evaluation during generation
  • 3. Welcome unusual ideas — wild ideas often spark practical ones
  • 4. Combine and improve — build on others' ideas (1+1=3)
📋 Nominal Group Technique (NGT)

Structured process for problem identification, solution generation, and group decision-making. Prevents dominant voices from controlling the output.

Five Steps:
  1. Introduction and explanation of the problem
  2. Silent individual generation of ideas (written)
  3. Round-robin sharing — one idea per person per turn
  4. Group discussion and clarification
  5. Voting and ranking to reach group decision
🗳️ Multi-Voting

Used after brainstorming generates a long list — reduces/narrows the list using group consensus without endless debate.

Each member selects their top N ideas and ranks them (e.g. top 5, scored 5 down to 1). Scores are summed — highest total = group priority. Repeat until a manageable shortlist remains.

⚖️ Force Field Analysis

Identifies and maps the forces driving change against the forces resisting it. Developed by Kurt Lewin. Used in change management and improvement planning.

Driving forces (strengthen these): customer demand for fewer defects, competitive pressure, lower downtime, increased sales opportunity. Restraining forces (weaken these): initial investment cost, fear of new technology, habit/inertia.

Conflict Resolution — Thomas-Kilmann Model

StyleConcern for SelfConcern for OthersWhen to use
CompetingHighLowSafety emergencies; critical quality hold decisions; when you know you're right
CollaboratingHighHighComplex quality problems requiring buy-in from all parties; best long-term solution matters
CompromisingMediumMediumWhen a temporary solution is needed; when both parties have equally valid goals
AvoidingLowLowWhen the issue is trivial; when more information is needed before engaging
AccommodatingLowHighWhen preserving the relationship matters more than the outcome; when you're wrong

Customer Relations & Supplier Management

Quality professionals must manage both directions of the value chain — understanding and capturing customer requirements, and ensuring suppliers deliver conforming product and services reliably.

Supplier Lifecycle Management

With mid-to-large corporations spending ~50% of revenue on purchased goods and services, supplier management is critical to organisational success. The Supplier Lifecycle Management framework is a structured, end-to-end approach to managing suppliers transparently, mitigating risk, reducing costs, and building long-term partnerships.

① Selection & Qualification

Identify → Shortlist → Prequalify → Bidders list → RFP/RFQ → Evaluate → Award. Includes sub-tier supplier identification.

② Performance Monitoring

Set performance expectations; process reviews; evaluations against KPIs (cost, quality, schedule, responsiveness); improvement plans; exit strategies.

③ Classification

Tier suppliers: Non-approved → Approved → Preferred → Certified → Partnership → Disqualified. Classification drives audit frequency and oversight level.

④ Partnerships & Alliances

Develop strategic customer-supplier partnerships; shared improvement initiatives; joint development; supply chain resilience strategies.

Supplier Selection Process

StepActivityKey considerations
1. IdentifyFind potential suppliers; new suppliers may offer cost or quality advantage; promote local suppliersMarket research, industry directories, referrals
2. ShortlistScreen to avoid late delivery, poor quality, non-responsive suppliersMarket reputation, public information, financial health
3. PrequalifyAssess financial stability, capacity, quality certifications (ISO 9001), client approvalsOn-site surveys, questionnaires, certificate verification
4. Bidders ListMaintain a qualified list to avoid repeating prequalification each timeApproved Vendor List (AVL) maintenance
5. Request BidsRFP — buyer states preferences, bidder explains how they'll meet them. RFQ — buyer provides exact spec, bidder quotes a priceChoose RFP when requirements are not fully defined
6. Evaluate BidsScore against pre-determined criteria: price, quality, schedule, commercial terms, financial stability, production capability, HSE responsibilityWeighted scoring matrix; multi-person evaluation team
7. AwardPlace Purchase Order with selected supplierContractual quality requirements, inspection criteria, escalation process

Supplier Performance Monitoring Parameters

💰 Cost
  • Under/over budget variance
  • Cost savings achieved
  • Cost-reduction proposals
✅ Quality
  • Incoming defect rate (PPM)
  • Returns and failures
  • Corrective action closure rate
📅 Schedule
  • On-time delivery %
  • Shortage incidents
  • Lead time vs committed
📞 Responsiveness
  • Response time to queries
  • Flexibility to order changes
  • Escalation engagement

Risk Management, Business Continuity & Barriers to Quality

Risk — ISO 31000 Definition & Framework

Risk = Effect of uncertainties on objectives (ISO 31000:2009). An effect is a deviation from expected — positive (opportunity) or negative (threat). A risk that has already occurred is reclassified as an issue. Risk is characterised by its potential consequences and the likelihood of occurrence.

Risk Management StepActivityQuality tool
1. Identify RisksList all potential threats and opportunities that could affect objectivesFMEA, HAZOP, brainstorming, risk register
2. Prioritise RisksScore by probability × impact; focus resources on high-priority risksRisk matrix (5×5), RPN in FMEA
3. Mitigation ControlDefine actions to reduce probability and/or impact of each riskControl plans, poka-yoke, redundancy
4. Mitigation EffectivenessMonitor whether controls are working; update risk registerKPIs, audits, leading indicators tracking
Business Continuity Plan (BCP)

A system of prevention and recovery for potential threats to the organisation. Covers extreme, existential scenarios.

Common threats: Fire, flood, earthquake, strike, war, power outage, cyber attack, terrorist attack.

Contingency Planning

A plan for outcomes other than the expected — less extreme than BCP. Covers probable disruptions.

Examples: Supplier bankruptcy, price/currency fluctuation, component discontinuation, key personnel departure.

Resiliency

The capacity to rapidly adapt and recover from internal or external disruptions. IBM identifies six building blocks of resilience:

Recovery · Hardening · Redundancy · Accessibility · Diversification · Autonomic Computing

Supply Chain Risk Categories

WhereRisk categoryExamples
At SupplierNatural causesFlood, earthquake, wildfire destroying plant or inventory
At SupplierMan-made causesStrike, fire, civil unrest, quality failure, management change
At SupplierEconomic causesInsolvency, sub-supplier failure, currency collapse, credit freeze
In TransitNatural or man-madePort closure, transport strike, customs hold, damage in transit
On ReceiptQuality or reputationalDefective product, counterfeit parts, labelling errors, regulatory non-compliance

Barriers to Quality Improvement

Understanding why quality improvement initiatives fail is as important as knowing how to run them. The engineering practice tests recognition of these barriers and appropriate countermeasures.

Common Barriers
  • 🔀 Confusion over the definition of quality — when quality means different things to different stakeholders, initiatives fragment
  • 👤 Lack of leadership — quality improvement without visible management commitment fails at the first obstacle
  • Short-term thinking — quality ROI is often long-term; pressure for immediate financial results kills improvement programs
  • 📊 Lack of data — unable to quantify the magnitude of the problem or the benefit of fixing it
  • 🎓 Insufficient qualified people — quality improvement requires statistical literacy and tool expertise (Black Belt, quality engineering, etc.)
Countermeasures
  • ✓ Align on a single, clear quality policy — signed by top management and communicated to all
  • ✓ Visibly involve senior leaders in quality reviews, audits, and improvement projects
  • ✓ Link quality metrics to the Balanced Scorecard to give them financial language
  • ✓ Build a QIS to capture and surface data that quantifies the cost of poor quality
  • ✓ Invest in CQT/Black Belt certifications; develop internal quality competency

ASQ Code of Professional Ethics — Three Pillars

① Integrity & Honesty

Be truthful in all professional interactions. Accurately represent qualifications, certifications, and affiliations. Offer services only within areas of genuine competence. Make decisions in an objective, factual manner.

② Responsibility, Respect & Fairness

Hold paramount the safety, health, and welfare of individuals and the public. Treat others fairly, courteously, with dignity, and without discrimination. Act in a socially responsible manner.

③ Proprietary Information & Conflicts

Protect confidential information; never use it for personal gain. Disclose and avoid real or perceived conflicts of interest. Give credit where due; do not plagiarise. Obtain and document permission to use others' intellectual property.

Classification of Quality Characteristics

Understanding what quality means to different stakeholders — from product performance to service interactions — is foundational to quality engineers Body of Knowledge. Three frameworks define quality characteristics at different levels of abstraction.

Garvin's 8 Dimensions of Product Quality

David Garvin (Harvard, 1987) proposed that quality is multi-dimensional — a product can be high quality on one dimension and poor on another. This prevents organisations from optimising a single metric at the expense of overall customer value.

#DimensionDefinitionQuality engineering relevance
1PerformancePrimary operating characteristics — does the product do what it should?CTQ characteristics, functional specifications, Cpk targets
2FeaturesSecondary supplementary attributes that enhance the basic functionVoice of Customer (QFD), feature vs cost trade-offs
3ReliabilityProbability that the product performs its intended function over time without failureMTBF, Weibull analysis, bathtub curve, reliability testing
4ConformanceDegree to which a product meets pre-established standards and specificationsCpk, DPMO, attribute inspection, MIL-STD-1916
5DurabilityUseful life of the product before replacement is preferable to repairAccelerated life testing, design for reliability
6ServiceabilitySpeed, courtesy, competence, and ease of repairMTTR, design for maintainability, spare parts availability
7Aesthetics / StyleHow the product looks, feels, sounds, tastes, or smells — subjectiveVisual inspection standards, appearance audits, colour matching
8Perceived QualityReputation and image — what the customer believes based on brand and word of mouthCustomer satisfaction surveys, NPS, warranty claim rates
💡

Key relationships: Reliability = MTBF/failure rate. Conformance = meets spec/Cpk. Serviceability = MTTR/maintainability. Perceived quality = customer perception/surveys.

SERVQUAL — Service Quality Dimensions

Parasuraman, Zeithaml, and Berry (1985) identified 10 service quality dimensions that customers use to evaluate service. These were later consolidated into 5 dimensions — the RATER model.

Original 10 SERVQUAL Dimensions

#Dimension
1Reliability
2Responsiveness
3Competence
4Access
5Courtesy
6Communication
7Credibility
8Security
9Understanding the customer
10Tangibles

Consolidated to 5 — The RATER Model

R — Reliability

The ability to perform the promised service dependably and accurately

A — Assurance

Knowledge and courtesy of employees; their ability to convey trust and confidence

T — Tangibles

Appearance of physical facilities, equipment, personnel, and communication materials

E — Empathy

Provision of caring, individualised attention to customers

R — Responsiveness

Willingness to help customers and provide prompt service

Lean Deep-Dive — Waste, Metrics, SMED & Visual Controls

Lean is built on one fundamental idea: waste exists in all processes at all levels. Eliminating waste is the key to successful lean implementation and the most effective way to increase profitability without capital investment.

Muda, Mura & Muri — The Three Types of Waste

Muda — 無駄
Activity that is wasteful / non-value-adding

Type I Muda (Incidental): Non-value-added tasks that seem necessary — business conditions must change to eliminate them (e.g. regulatory inspections).

Type II Muda (Pure Waste): Non-value-added tasks that can be eliminated immediately — no business justification.

Mura — 斑
Unevenness / variation leading to imbalance

Mura exists when workflow is out of balance or workload is inconsistent. Creates alternating overloading and underloading.

SMED reduces Mura by enabling smaller batch sizes and more frequent changeovers — smoothing out production flow.

Muri — 無理
Overburden — unreasonable stress on people/equipment

For people: too heavy a mental or physical burden — leads to quality errors, injuries, and absenteeism.

For machines: running beyond designed capacity — leads to breakdowns and quality deterioration.

8 Types of Muda — DOWNTIME

The original Toyota Production System identified 7 types of muda. Western lean practitioners added an 8th — under-utilised staff (knowledge, talent, and creativity). The acronym DOWNTIME (or TIMWOOD) covers all eight:

LetterWasteDefinitionExample
DDefectsSorting, rework, repetition, or making scrapWelding defects requiring re-weld; wrong labels requiring replacement
OOverproductionProducing too much, too early, and/or too fastPrinting 1,000 brochures when only 200 are needed
WWaitingPeople or parts waiting for a work cycle to finishOperator idle while machine cycles; material waiting in queue
NNon-utilised talentFailure to exploit employees' knowledge, skills, and creativityAsking assembly workers to follow instructions without seeking their improvement ideas
TTransportationUnnecessary movement of people or parts between processesMoving parts from one building to another before assembly
IInventoryMaterials parked and not having value added to themRaw material sitting in a warehouse for 3 weeks
MMotionUnnecessary movement of people or parts within a processOperator walking 15m to get tools that could be stored at the workstation
EExtra ProcessingProcessing beyond what the customer requires or demandsPolishing a surface that will be hidden; generating reports nobody reads

Standard Work

Standard Work means doing work in a standard way — one best-known method, followed consistently by all people for that task. It is the foundation of quality, safety, and continuous improvement.

  • ✓ All people perform one task in one way only
  • ✓ Eliminates variation caused by different methods
  • ✓ Makes abnormalities immediately visible
  • ✓ Improvements lead to revised standard work — the PDCA cycle applied to work methods
  • ✓ Not "the boss's way" — the best-known way, documented and agreed
Standard Work Documents
  • Standard Work Chart: Shows sequence of tasks, times, and movement in a cell layout
  • Job Instruction Sheet: Step-by-step WI with quality checkpoints and safety notes
  • Time Observation Sheet: Records actual vs takt time — identifies bottlenecks

Process Flow Metrics — Takt, Cycle, Lead Time & Throughput

MetricDefinitionFormulaWorked example
WIP
Work In Progress
Partially finished goods in the process waiting for completion 50 units partially assembled on the production floor
WIQ
Work In Queue
Material at a workstation waiting to be processed (subset of WIP) 12 units waiting in the queue at Process 3 (the bottleneck)
Touch Time Time material is actually being worked on — excludes moving and waiting 30-minute cycle time; 8 min actual machining → touch time = 8 min
Takt Time Time available to produce one unit to meet customer demand Takt = Net time / Demand 40 hrs/week, 10 units/week → Takt = 4 hrs/unit. With 1 hr breaks: Net = 35 hrs → Takt = 3.5 hrs/unit
Cycle Time How long it takes to complete a specific task from start to finish — for one process step CT = 1 / Throughput If takt = 3.5 hrs, and Process 3 takes 3.5 hrs → Process 3 is balanced. If it takes 4 hrs → bottleneck.
Lead Time Total time from work requested to work delivered — includes all waiting and processing time LT = WIP / Throughput WIP = 50 units, Throughput = 10 units/day → Lead Time = 50/10 = 5 days
Throughput Rate Average number of units processed per time unit TR = 1 / Cycle Time Cycle time = 20 min → TR = 3 units/hr → 24 units/8hr shift

SMED — Single-Minute Exchange of Die

What & Who

SMED is a lean methodology for rapidly converting a manufacturing process from running one product to running the next. Developed by Shigeo Shingo. "Single-Minute" means less than 10 minutes (single digit) — not literally 1 minute.

Benefits
  • ✓ Reduced inventory (smaller economic batch sizes)
  • ✓ Increased machine utilisation despite more changeovers
  • ✓ Elimination of setup errors
  • ✓ Reduced defect rates (less scrap at startup)
  • ✓ Reduces Mura — balances production line
8 Techniques for Implementing SMED
  1. Separate internal from external setup operations (internal = machine must stop; external = can be done while machine runs)
  2. Convert internal to external setup
  3. Standardise function, not shape
  4. Use functional clamps or eliminate fasteners altogether
  5. Use intermediate jigs
  6. Adopt parallel operations
  7. Eliminate adjustments
  8. Mechanisation

Visual Controls — Andon & Jidoka

Visual Controls — 4 Types
TypeQuestion answeredExamples
IdentificationWhat is it?Labels, colour-coded bins, part numbers
InformationalWhat is the current status?Andon lights, production boards, KPI dashboards
InstructionalHow should the task be performed?WI posted at workstation, standard work charts
PlanningWhat is the plan?Kanban boards, production schedules, Gantt charts
Andon — Status Indicator Light

A visual control device that indicates the status of a machine, line, or process at a glance:

Green — Normal operations
Yellow — Changeover or planned maintenance due
Red — Problem occurred, machine/line is stopped
Jidoka — Automation with a Human Touch

The ability to stop work (machine or line) when a problem is detected. Prevents defects from being passed downstream and ensures immediate corrective action. The Andon system is the device that activates Jidoka by signalling the problem.

OEE — Overall Equipment Effectiveness

OEE measures how effectively a manufacturing operation is utilised, combining availability, performance, and quality into a single metric. World-class OEE is generally considered to be ≥85%.

OEE = Availability × Performance × Quality
ComponentFormulaMeasures
AvailabilityRun Time / Planned Production TimeUnplanned downtime losses
PerformanceActual Output / Max Possible OutputSpeed losses and minor stoppages
QualityGood Parts / Total Parts ProducedDefects and rework losses
OEE Worked Example
Planned time: 8 hrs = 480 min
Downtime: 60 min
Run time: 420 min
Availability: 420/480 = 87.5%

Ideal cycle time: 1 min/part
Actual output: 400 parts (420 possible)
Performance: 400/420 = 95.2%

Good parts: 380 of 400
Quality: 380/400 = 95.0%

OEE = 87.5% × 95.2% × 95.0% = 79.1%
💡

World-class benchmark: Availability ≥90%, Performance ≥95%, Quality ≥99.9% → OEE ≥85%

Root Cause Analysis — Finding the Real Problem

Most organisations fix the same problems over and over. Root cause analysis (RCA) breaks that cycle by asking why until the true source of a problem is found — then eliminating it permanently. Based on ASQ sources including Andersen & Fagerhaug and Duke Okes.

The Core Idea
🩹
Symptom Fix
"The machine keeps jamming."

→ Clear the jam. Back to work.
Problem returns next week.
🔍
Physical Cause Fix
"A worn guide rail is causing the jam."

→ Replace the rail.
Problem stays away — until the next part wears.
⚙️
System / Root Cause Fix
"No PM schedule exists for guide rails."

→ Create a preventive maintenance process.
The class of problem is eliminated.

Only a system-level cause — a change to the way the organisation operates — truly prevents recurrence. Physical cause fixes are necessary but not sufficient.

The Cause Hierarchy — Drilling Down

Every visible problem sits at the top of an iceberg. Below it are layers of cause. Most organisations only fix the visible tip.

The Cause Iceberg
👁
Symptom (Visible)
The jam. The defect. The complaint.
⬇️
First-level Cause
The worn rail. The missing label.
⬇️
Higher-level Cause
No inspection process. Poor training.
🎯
Root Cause (System)
No maintenance policy exists.
Physical Cause

The tangible, material thing that failed or caused the event. Also called direct, immediate, or proximate cause. Fixing it is necessary — but only solves this occurrence.

Human Cause

Human error, forgetfulness, or lack of skill. Critical: don't stop here. Ask what system failed to support the human. Blame eliminates people, not problems.

System / Latent Cause ← Find This

A policy, procedure, training gap, or organisational decision that created the conditions for the failure. This is the root cause. Fixing it changes how the organisation operates — preventing the whole class of problem.

The 6-Step RCA Process — The Story Arc

Think of RCA as a detective story. You start with a crime scene (the event), gather evidence (causes), interrogate witnesses (data), find the culprit (root cause), and change the system so it can never happen again.

Step 1
🔎
Define the Event

Write a precise, unambiguous description of the problem. Answer: What? When? Where? Who? How often? What consequences?

Hairy: "The process is slow."
Precise: "Window replacement takes 47 min avg vs 20 min standard, occurring 3× weekly since Jan, costing $8,400/mo in overtime."
Step 2
🗺️
Find Causes

Map the process with a flowchart. Brainstorm all possible causes. Use a fishbone (Ishikawa) diagram to organise them into categories.

Key categories: Equipment · Environment · Methods · Materials · Measurement · People
Step 3
🎯
Find the Root Cause

Use the 5 Whys to drill down. Build a cause-and-event tree. Use Pareto to prioritise. Don't declare success too early.

Rule: Keep asking "why" until you reach something the organisation can change — a policy, process, or system.
Step 4
💡
Find Solutions

Generate solutions using "Why Not" principles. Use an Impact/Effort matrix to select the best option. Involve those who will implement.

Analogy thinking: how has another industry solved a similar problem? Don't be constrained by how things are currently done.
Step 5
🚀
Take Action

Use a Force Field Analysis to anticipate resistance. Run a pilot. Assign clear ownership. Be patient — lasting change takes time.

Involve those who must change their work. A solution designed against people is a solution that will fail.
Step 6
📊
Measure & Assess

Track the metrics that defined the problem in Step 1. Confirm the solution works. Assess effectiveness over time.

If the problem returns — the root cause was not truly found. Return to Step 3, not Step 5.

The 5 Whys — A Worked Example

Developed at Toyota as part of the TPS. The idea: keep asking "why" until you reach the system-level cause. Five iterations is a guideline — stop when you reach something that can be permanently changed.

Scenario
A lamp manufacturer is scrapping 12% of finished assemblies due to dimensional variation in lamp holders from a supplier.
Why #
Question
Answer
Why 1
Why are lamp holders out of spec?
Supplier dimensions vary beyond tolerance.
Why 2
Why does supplier variation exceed tolerance?
No dimensional specification was communicated to the supplier.
Why 3
Why was no specification communicated?
Procurement selected supplier on price only. Engineering was not involved.
Why 4
Why wasn't engineering involved in supplier selection?
No cross-functional supplier approval process exists.
Why 5 ✓
Why is there no cross-functional approval process?
Procurement policy only requires lowest price. Quality and engineering sign-off is not mandated. ← Root Cause
Cost reality check: Procurement saved ~$50,000/yr on purchase price. The rework and scrap cost from the same decision? Over $200,000/yr. The root cause was a procurement policy that optimised the wrong metric.

RCA Toolbox — The Right Tool for Each Step

Step 1–2 · Mapping

Fishbone (Ishikawa) Diagram

Organises possible causes into 6M categories: Machine, Method, Material, Man, Measurement, Mother Nature. The "spine" points to the problem; "bones" are cause categories.

EFFECT Machine Method Material Man Measurement Mother Nature Wear No SOP Wrong spec Untrained Gauge drift Humidity
Best for: brainstorming sessions with cross-functional teams where all possible causes are unknown.
Step 3 · Drilling Down

5 Whys

Ask "why" repeatedly until a system-level cause is reached. Simple, fast, and effective for straightforward problems. For complex issues, use a Cause-and-Event Tree.

WHY 1 Machine keeps jamming WHY 2 Guide rail is worn WHY 3 No maintenance schedule exists WHY 4 PM was not assigned to anyone WHY 5 — ROOT CAUSE No PM ownership policy exists
Warning: it is possible to arrive at the wrong root cause if evidence is not collected carefully.
Step 2–3 · Data Analysis

Pareto Chart

Ranks causes by frequency or cost. Reveals the vital few from the trivial many. The 80/20 principle — 20% of causes typically create 80% of problems.

Frequency 80% Dim. Surface Burrs Colour Other Count Cumulative %
Tip: look at the data multiple ways — by frequency AND by cost. The Pareto priority may differ.
Step 3 · Root Cause Confirmation

Cause-and-Event Tree

A hierarchical diagram showing connections between causes at different levels. Used to prune possible causes, reveal compound causes, and trace pathways from event back to root.

TOP EVENT Product Failure Design Failure Cause A Process Failure Cause B No design review Tolerance error No SOP No training ROOT CAUSE ROOT CAUSE Green = minimal cut sets (root causes requiring action)
Use when: multiple independent causes exist or when cause chains are complex and branching.
Step 4 · Solution Selection

Impact / Effort Matrix

Plot each potential solution on a 2×2 grid: impact (high/low) vs effort (high/low). Quick wins sit in high-impact, low-effort. Avoid low-impact, high-effort.

QUICK WINS ⭐ MAJOR PROJECTS FILL-INS AVOID ✗ A A B C D E F Effort → High Impact → High
Involve the people who must implement — their effort estimate is the one that matters.
Step 5 · Implementation

Force Field Analysis

Lists forces driving the change against forces restraining it. Helps teams plan how to amplify driving forces and reduce resistance before implementation begins.

CHANGE TARGET DRIVING FORCES Management support Cost savings clear Team motivated RESTRAINING FORCES Budget constraints Resistance to change Skills gap KEY INSIGHT Reducing restraining forces is more effective than adding driving forces
Key insight: reducing restraining forces is usually more effective than amplifying driving forces.

8 Mistakes That Kill an RCA

Stopping at the symptom

"We fixed the jam" — without asking why the jam happened or why it wasn't prevented.

Declaring success too early

Stopping at the physical cause — "we replaced the part." The system that allowed it to fail is unchanged.

Blaming people

"Operator error" is never a root cause. It is always a prompt to ask: what system failed to prevent or catch the human error?

Vague problem definition

"The process is slow." Without specifics — what, where, how often, at what cost — the team will solve different problems.

Speculation before data

Teams jump to "I think it's X" before mapping the process or collecting evidence. Confirmation bias sets in.

Ignoring compound causes

Many problems have multiple independent causes — fixing one doesn't eliminate the other. Each branch needs its own "why" chain.

Skipping the pilot

Implementing a solution at full scale without testing it first. If it doesn't work, the cost and disruption are multiplied.

Not measuring the result

Without returning to the Step 1 metrics after implementation, you never know if the root cause was truly found and fixed.

Sources: Andersen & Fagerhaug, ASQ Pocket Guide to Root Cause Analysis (ASQ Quality Press, 2014) · Duke Okes, Root Cause Analysis: The Core of Problem Solving and Corrective Action, 2nd ed. (ASQ Quality Press, 2019)
Quality Systems

Quality Systems

QMS certification, PPAP/APQP, special characteristics, 8D problem-solving with hard deadlines, and supplier performance management — the complete automotive supply chain quality framework.

QMS Operating Standard

Quality, Cost & Delivery — Zero Defect is Not Aspirational

Every supplier QMS must deliver green-rated performance across QCD. These are operational standards with zero tolerance on Safety & Regulation requirements.

Zero-Defect Core Objectives

0 PPM Strategy

Zero defective parts shipped to the customer. No acceptable defect rate — the target is absolute prevention, not statistical tolerance.

0 Tolerance — S/R Requirements

Safety and Regulation characteristics carry absolute zero tolerance. No sampling plan, no concession, no deviation permitted.

0 IPB Strategy

Zero Incidents per Billion — the field performance target for safety-critical systems. Drives design robustness requirements upstream.

Green Supplier Scorecard

Supplier Self Assessment (SSA) fully compliant. Maintained green status on the OEM Supplier Scorecard across quality, delivery, and responsiveness metrics.

QMS Certification Progression

📊 QMS Maturity Ladder — ISO 9001 to IATF 16949 (3rd Party)
Foundation
Level 1
ISO 9001
3rd Party Cert.
Customer Aligned
Level 2
ISO 9001 + CSR
MAQMSR aligned
Automotive Grade
Level 3
IATF 16949
2nd Party Audit
Gold Standard
Level 4 ★
IATF 16949
OEM Target
↑ Add CSRs
↑ IATF scope
↑ 3rd party cert.

AIAG Core Tools — All Five Required in Every Supplier QMS

APQP
Advanced Product Quality Planning
PPAP
Production Part Approval Process
FMEA
Failure Mode & Effects Analysis
MSA
Measurement System Analysis
SPC
Statistical Process Control
📋 Record Retention

Maintain quality records — retrievable and legible — for the life of the program. Applies to sub-suppliers.

Non-conforming product records retained for trend analysis per AIAG / ISO 9001 / IATF 16949.

🌿 Environmental Requirements
MINIMUM All applicable local government regulations met.
TARGET ISO 14001 Environmental Management System or equivalent.
NEW OEM Achieve ISO 14001 certification within 3 years of first order.
Production Part Approval Process

PPAP — Proving Production is Ready Before It Starts

PPAP is the supplier's formal proof that the production process can consistently make conforming parts at the quoted rate. It is not a one-time paperwork exercise — it is evidence of process understanding. Level 3 is the default: PSW + complete 18-element data package.

The 18 PPAP Elements — What Every Package Must Contain

The AIAG PPAP manual (4th edition) defines 18 elements. Which elements are required for submission depends on the Level (1–5) — but the supplier must generate all elements internally regardless of what is submitted to the customer.

1 Design Records

All drawings (CAD/2D), specifications, and engineering change documents. If supplier owns design: DFMEA required. Customer-owned design: drawings provided by customer.

2 Authorised Engineering Change Documents

All open engineering changes not yet incorporated into the design record. Must show written customer authorisation. Includes ECNs, deviation permits, and waivers.

3 Customer Engineering Approval

Written approval from the customer engineering activity — typically a signed prototype or pre-production buy-off. Required before production tooling is committed.

4 Design FMEA (DFMEA)

Required when supplier owns the design. Documents all potential failure modes of the design and their effects. Severity, Occurrence, and Detection ratings. Must be live — not a snapshot.

5 Process Flow Diagram

Step-by-step flow of the entire production process — from incoming material through shipping. Must match the Control Plan and PFMEA. Includes all operations, inspections, and rework loops.

6 Process FMEA (PFMEA)

Risk analysis of the manufacturing process — not the design. Documents how each process step can fail, its effect on the product, and controls in place. Drives the Control Plan. RPN threshold typically ≤100.

7 Control Plan

Three phases required: Prototype, Pre-Launch, and Production. Documents every control method for each characteristic — measurement method, frequency, sample size, reaction plan. The living document of process control.

8 Measurement System Analysis (MSA)

GR&R studies for all gauges measuring CCs and SCs. Typically 3 operators × 10 parts × 2 trials. %GRR <10% preferred; <30% conditionally acceptable; >30% — gauge must be improved before PPAP.

9 Dimensional Results

Full balloon-drawing inspection of a minimum 6 parts (or per customer requirement). Every characteristic on the print — not just CCs. Results shown in table format with nominal, tolerance, and actual measured values.

10 Material & Performance Test Results

Test results for all material specifications (tensile, hardness, chemical composition) and functional performance tests (fatigue, pressure, thermal cycling). Must include lab certification and traceability to production material.

11 Initial Process Studies (Cpk)

SPC data from the PPAP production run for all CCs and SCs. Minimum 25 subgroups / 100 data points. Cpk ≥ 1.67 required for initial study. If not achieved: 100% inspection mandatory until Cpk improves.

12 Qualified Laboratory Documentation

Scope of accreditation for all labs performing tests (internal or external). ISO/IEC 17025 accreditation preferred. Must show the tests performed are within the lab's accredited scope.

13 Appearance Approval Report (AAR)

Required only for parts with appearance specifications (colour, texture, gloss, surface finish). Customer sign-off on physical colour/texture masters. AAR is a separate customer approval — not a dimensional check.

14 Sample Production Parts

Typically 6 production parts from the PPAP run (or per customer CSR). Must be from production tooling, at production rate, using production materials. Not prototype or pre-production parts.

15 Master Sample

One part signed off by both supplier and customer. Retained at the supplier (or customer if required) as the reference standard for appearance, dimensions, and functional acceptance criteria throughout the programme.

16 Checking Aids

All part-specific gauges, fixtures, jigs, and templates used for inspection. Must be documented and calibrated. Checking aid drawings and calibration records submitted where required by the customer.

17 Customer-Specific Requirements

Any additional requirements from the OEM Customer Specific Requirements (CSRs). Each OEM publishes their own CSR supplement — e.g. GM BIQS, Ford Q1, Stellantis Supplier Quality. These override the standard PPAP manual where they conflict.

18 Part Submission Warrant (PSW)

The cover document — supplier's declaration that the submitted parts meet all requirements and the package is complete. Signed by authorised supplier representative. No PPAP is valid without a signed PSW. This is Element 18 and the final gating document.

Element type: Documentation / records Risk analysis (FMEA) Measurement & data Customer-specific

PPAP Submission Levels — What You Send vs What You Keep

The Level defines what is physically submitted to the customer. All 18 elements must be generated and retained at the supplier site regardless of level.

Level What is Submitted to Customer When Used
1 PSW only (warrant only, no data) Non-critical, commodity parts; customer waives data submission
2 PSW + limited supporting data + samples Low-risk parts; customer selects specific elements to review
3 PSW + complete data package (all 18 elements) Default level — used unless customer specifies otherwise
4 PSW + other requirements as defined by customer Customer specifies exactly what additional data is required beyond PSW
5 PSW + complete package reviewed at supplier's manufacturing site New suppliers, new processes, high-risk parts — customer sends team to supplier
📊 Cpk Requirements
CharacteristicStudy TypeMin Cpk
Critical Characteristic (CC)Initial PPAP≥ 1.67
CC / SCOngoing production≥ 1.33
Below targetAny100% inspect
⚠️ 90-Day Change Rule

All changes require minimum 90 days advance notice and written approval before implementation.

A new PPAP with PSW is required before serial production resumes after any approved change.

Triggers: manufacturing location change · material change · design change · tooling inactive 12+ months · sub-supplier change

🔗 APQP — What Feeds the PPAP Package

PPAP is the output; APQP is the process that generates it. These APQP deliverables directly populate the 18 elements:

▸ Process Flow → Element 5
▸ DFMEA → Element 4
▸ PFMEA → Element 6
▸ Control Plan (3 phases) → Element 7
▸ MSA / GR&R → Element 8
▸ Initial Cpk studies → Element 11

Special Characteristics — CC / SC / IC

Must appear on all supplier Process Flow Diagrams, FMEAs, and Control Plans. Identified by symbols on engineering drawings.

CC — Critical

Critical Characteristic

Affects government regulation compliance or safety. Any deviation could endanger the end user.

REQUIRED
  • Process performance studies + ongoing monitoring per Control Plan
  • Cpk > 1.33 (initial: 1.67) or 100% inspection
RECOMMENDED
  • 100% automatic control + poka-yoke + SPC
SC — Significant

Significant Characteristic

Important for customer satisfaction. Affects fit, functionality, durability, or processing.

REQUIRED
  • Process performance studies + ongoing monitoring
  • Cpk > 1.33
RECOMMENDED
  • 100% automatic control + poka-yoke + SPC
IC — Important

Important Characteristic

Identified by expert knowledge as important product/process parameter for quality performance.

REQUIRED
  • Process performance studies at initial and subsequent part submissions only

8D Problem Solving

Structured 8-step approach — find and eliminate the systemic weakness that allowed the problem to occur, not just fix the symptom.

D1 · Within 24h

Problem Description & Team

Define the problem with data. Assemble cross-functional team with relevant expertise. Launch immediately.

⏰ 24 hours from initial complaint
D2 · Within 24h

Problem Definition

Quantify with data. Is/Is-Not analysis. Define what is wrong, where, when, how much.

⏰ 24 hours from complaint
D3 · Within 48h — HARD DEADLINE

Containment Actions

Protect the customer immediately. Document D3 actions and verify their effectiveness.

⏰ 48 hours — non-negotiable
D4 · Within 10 working days

Root Cause Analysis

Identify root cause for occurrence AND non-detection. Use 5-Why, fishbone, fault tree.

⏰ 10 working days
D5 · Within 10 working days

Define Corrective Actions

Select best permanent corrective action. Define implementation plan with owners & dates.

⏰ 10 working days
D6 · Within 30 working days

Implement & Verify

Confirm actions implemented. Provide evidence (photos, data, updated documents). Verify effectiveness with data.

⏰ 30 working days
D7 · Per D5 plan (≤90 days)

Prevent Recurrence

Update FMEAs, Control Plans, Process Flow, work instructions, training. Apply lessons to similar processes.

⏰ Typically ≤ 90 days total
D8 · Per D5 plan (≤90 days)

Official Closure

Confirm effectiveness, remove containment, officially close, recognize the team, file the report.

⏰ Official closure ≤ 90 days

Must communicate at D3, D5, and D8. When D8 takes >90 days, weekly reviews with the SQR are expected. Written response required for all chargebacks, even disputed ones.

Supplier Performance Evaluation

Expectation: zero (0) defects. Performance tracked across KPI categories for volume allocation, global expansion, and future business decisions.

Scorecard KPIs

Delivered Product Quality (PPM)Target: 0
Delivery Schedule PerformanceTarget: 100%
8D On-Time CompletionRequired
QMS Certification LevelIATF Target
PPAP On-Time Approval Rate≥ 98%

Response Time Requirements

MilestoneDeadlineDeliverable
Initial response24 hoursProblem description + team launch
D3 Containment48 hoursContainment actions confirmed in place
D5 Root CauseRoot cause + corrective action plan
D6 ImplementationActions confirmed + supporting evidence
D8 Closure≤ 90 daysOfficial 8D closed & filed

🚨 Escalation Model

1
Normal

NCR Tracking

Non-conformances tracked, action plans monitored.

2
Elevated

Increased Oversight

Weekly reviews, SQR direct involvement.

3
Critical

Special Status

Customer notification, audit scheduled.

4
Severe

Business Hold

No new business awards. Potential disqualification.

Glossary

Structured methodology defining steps to ensure products satisfy customers. Covers design/development, process design, product/process validation, and feedback/corrective action.

Defines requirements for production part approval including bulk materials. Determines customer requirements are understood and the process can consistently produce conforming product at the quoted rate.

Authorizes serial production. Contains supplier/part info, required documentation, and disposition. An approved PSW is required before the first serial production shipment.

Proactive risk management tool. Identifies potential failure modes, their effects, and causes. DFMEA required when supplier owns product design. PFMEA covers process failures. RPN = Severity × Occurrence × Detection.

SNCR (Supplier Non-Conformance Report) issued when plant receives out-of-spec material — triggers an 8D. SCB (Supplier Charge Back) recovers costs: extra freight, line stoppages, rework, sort, scrap, travel, recalls.

Suppliers must screen ECHA publications at least twice per year. Submit Article 33 information to customers if products contain SVHC above 0.1% w/w. Safety Data Sheets required per Art. 31 EU REACH Regulation.

Inventory practice ensuring oldest stock shipped first. Prevents obsolete material reaching the customer. Mandatory for all suppliers. Shelf-life limits must be monitored and respected at all times.

ISO 9001:2015 — The Complete Quality Management System Standard

ISO 9001:2015 is the world's most widely adopted quality management system standard. It moved from a 23-element prescriptive model (2008) to a risk-based, process-driven framework built on Annex SL's High Level Structure, enabling integration with ISO 14001, ISO 45001, and other management system standards.

ISO 9001 Revision History

YearEditionKey change
19871st issueFirst international QMS standard — prescriptive 20-element model
19942nd issueMinor updates, maintained 20-element structure
20003rd issueMajor restructure — process approach introduced, 8 sections
20084th issueClarifications only — no new requirements added
20155th issue (current)Annex SL structure, risk-based thinking, no Quality Manual required, no Management Representative, no Preventive Action clause

Key Changes: 2008 → 2015

ISO 9001:2008 termISO 9001:2015 term
ProductsProducts and services
Documentation / RecordsDocumented information
Work environmentEnvironment for the operation of processes
Purchased productExternally provided products and services
SupplierExternal provider
Annex SL High Level Structure — identical clause numbering across all ISO management system standards (14001, 45001, 13485, etc.) enabling integrated management systems.
Risk-Based Thinking — replaces the old "Preventive Action" clause. Risk is now embedded throughout planning (§6.1) and operations.
No Quality Manual required — organisations may choose to maintain one, but §4.3 scope documentation replaces the mandatory manual.
No Management Representative — responsibility for QMS is now part of top management's role, not a delegated position.
No Exclusion Clause — §4.3 requires justification for any non-applicable requirements rather than allowing simple exclusions.

ISO 9001:2015 — 10-Section Structure (PDCA)

📐 ISO 9001:2015 Structure — PDCA Mapping
Context PLAN DO CHECK ACT §4 Context Organization & context Interested parties Scope · Processes §5 Leadership Top management commitment Policy · Roles §6 Planning Risks & opportunities Quality objectives Planning of changes §7 Support Resources · Competence Awareness · Comms Documented info §8 Operation Plan & control Design · Ext. providers Production · Release §9 Performance Monitoring & measure Internal audit Management review §10 Improvement Nonconformity Corrective action Continual improvement PDCA Cycle — Continual Improvement §1–3: Scope, References, Terms

Document Control (ISO 9001:2015 §7.5) & Configuration Management (ISO 10007)

Documentation Hierarchy

LevelDocument typeContainsISO 9001:2015
1Quality ManualSystem overview, scope, policyNo longer mandatory
2ProceduresHigh-level process overview — multi-discipline, no detailed "how""Documented information"
3Work InstructionsStep-by-step "how the work is done"Retain as evidence
4Forms / RecordsEmpty = document; filled = recordProtected from alteration
💡

§7.5.3 requires documented information to be: available and suitable for use when needed, and adequately protected. Control activities include distribution, version control, storage, retention, and disposition.

Configuration Management (ISO 10007:2017)

Configuration management ensures product integrity over time by systematically controlling changes to the interrelated functional and physical characteristics of a product.

StepActivity
IdentificationDefine and label all configuration items (part numbers, revision levels)
Change ControlFormal review and approval before any change is implemented
Status AccountingRecord and report on the current state of all configuration items
AuditVerify actual product matches documented configuration baseline
📋

Example: Product version A = Part A rev 0 + Part B rev 1 + Part C rev 7. Version B = Part A rev 0 + Part B rev 2 + Part C rev 7. Change control ensures version B is formally released before production switches.

ISO 9001 Certification Chain

📐 The Three-Tier Certification Chain
IAF — International Accreditation Forum Voluntary association of Accreditation Bodies Provides: confidence and consistency globally Accreditation Body (AB) Certifies the Certification Body is following good practice USA: ANAB · UK: UKAS · Standard: ISO/IEC 17011 Certification Body (CB) / Registrar Evaluates your QMS and issues the ISO 9001 certificate Third-party auditing company · Standard: ISO/IEC 17021

Core vs Support Processes

Core Processes

Processes that must be performed and have significant direct impact on the organisation's success and ability to meet customer requirements.

Examples: order processing, product design, manufacturing, delivery, customer service

Process Approach (ISO 9001:2015 §4.4)

ISO 9001:2015 explicitly requires a process approach. Processes are defined by their inputs, outputs, interrelationships, and alignment with the strategic plan.

INPUT → PROCESS → OUTPUT
↑_______________________↑
Feedback loop

Quality Audits — Complete Reference

ISO 19011:2018 provides guidelines for auditing management systems. Audits are systematic, independent, documented processes for obtaining evidence and evaluating it objectively to determine the extent to which audit criteria are fulfilled.

Audit Types — Two Classification Systems

By Scope

TypeScopePurpose
System AuditComprehensive — multiple processes and their interactionsOverall QMS conformance
Process AuditOne specific process, activity, or functionCompare actual process to documented requirements
Product AuditA specific product or batchAssess "fitness for use" — does product meet design requirements?

By Party

PartyConducted byWhen
1st PartyInternal — organisation audits itself. Auditors have no vested interest in the area audited.Ongoing internal improvement
2nd PartyCustomer — audits its supplier before or after awarding a contractSupplier qualification, surveillance
3rd PartyIndependent audit organisation — free from any conflict of interest in the customer-supplier relationshipISO certification, regulatory compliance
Special typeDescription
Registration AuditThird-party audit to obtain ISO 9001 (or other standard) certification
Compliance AuditConfirms conformance to a specific standard or procedure. Differs from improvement audits — focuses on evidence of conformance, not performance improvement.

Audit Participants — Roles & Responsibilities

RoleDefinitionKey responsibilities
Client Organisation or person requesting the audit Initiates audit · Defines purpose and scope · Provides resources · Receives report · Determines distribution · Decides on actions
Lead Auditor Auditor responsible for leading the audit team Develops and communicates audit plan · Assigns roles · Chairs opening and closing meetings · Ensures team stays on track · Issues report and follow-up
Auditor Person who conducts the audit Understands purpose and scope · Plans audit · Collects and analyses evidence · Reports findings · Follows up actions
Auditee Organisation or individual being audited Informs staff · Provides resources and escorts · Shows objective evidence · Cooperates · Determines and initiates corrective actions
Technical Expert Person who provides specific knowledge or expertise to the audit team Supports auditors with specialist knowledge — not an auditor themselves
Observer Accompanies the audit team but does not audit May be a trainee auditor or a regulatory observer — no active role in the audit
Guide Person appointed by the auditee to assist the audit team Facilitates access, escorts, helps with logistics — does not influence audit findings

The Audit Process — Six Stages

📐 Audit Process Flow (ISO 19011:2018)
① Planning & Preparation Objectives, scope, checklist ② Opening Meeting Introductions, scope confirmed ③ Audit Interviews Collect & analyse evidence ④ Closing Meeting Present findings to auditee ⑤ Audit Reporting Accurate, objective, timely ⑥ Follow-up & Closure CA / PA verified · Records retained

Audit Report — 7 Quality Characteristics

🎯
Accurate

Free from errors and distortions — purpose clearly communicated

⚖️
Objective

Fair, impartial, and unbiased — evidence-based conclusions only

💡
Clear

Easy to understand, logical flow — no ambiguous language

✂️
Concise

Straight to the point — no unnecessary detail or padding

🔧
Constructive

Helps the client improve — practical, actionable recommendations

📋
Complete

Includes all relevant facts — nothing important omitted

⏱️
Timely

Well-timed to enable decisions on recommendations — not delayed

Follow-up Actions

Correction — fix the immediate problem
Corrective Action — eliminate the root cause
Preventive Action — prevent potential future issues
Effectiveness is verified, possibly in a subsequent audit.

Cost of Quality & Quality Training

Cost of Quality — The Four Categories

Management understands the language of money. Quantifying the cost of quality justifies spending on prevention and improvement activities, and sets measurable targets. Every pound/dollar spent on prevention reduces the much larger internal and external failure costs.

✅ Prevention Cost — Doing it Right

Money spent to prevent defects from occurring in the first place. The highest ROI category — every £1 spent on prevention saves £10–£100 in failure costs.

  • • Quality planning and system development
  • • Education and training (SPC, FMEA, statistical methods)
  • • Design reviews and FMEA
  • • Supplier reviews and qualification
  • • Quality system audits
  • • Process planning and capability studies
🔍 Appraisal Cost — Finding Defects

Money spent on inspecting and testing to detect defects. Necessary but non-value-adding — the goal is to reduce the need for appraisal by improving prevention.

  • • Test and inspection (receiving, in-process, final)
  • • Supplier acceptance sampling
  • • Product audits
  • • Calibration of measurement equipment
⚠️ Internal Failure Cost — Found Before Shipping

Cost of defects discovered before the product reaches the customer. Painful but preferable to external failures.

  • • In-process scrap and rework
  • • Troubleshooting and repair
  • • Design changes caused by quality problems
  • • Extra inventory to buffer poor yields
  • • Re-inspection and retest of reworked items
  • • Downgrading (selling at lower price)
🔥 External Failure Cost — Found by Customer

The most expensive category — defects discovered after delivery. Includes not just direct costs but reputational damage and lost future business.

  • • Sales returns and allowances
  • • Service level agreement penalties
  • • Complaint handling and investigation
  • • Warranty field labour and parts
  • • Recalls
  • • Legal claims and litigation
  • • Lost customers and business opportunities
💡

Visible COPQ (above the waterline): rejection, rework, repair, inspection costs — easily measured

Invisible COPQ (iceberg below waterline): lost sales, excess inventory, additional controls and procedures, complaint investigation, legal fees, customer dissatisfaction — hard to quantify but often much larger

Optimum Quality Cost Model

Traditional Model (Older View)

Assumed that improving quality beyond a certain level leads to increasing costs — there was an "optimal" defect rate where prevention + appraisal costs balanced failure costs. This model suggested that 100% quality was too expensive.

Modern Model (Current View)

Quality improvement consistently leads to cost reduction — there is no point of diminishing returns. Higher quality means fewer failures, less rework, less inspection, less warranty. Crosby's "Quality is Free" thesis is supported by this model.

Quality Training — ADDIE Model

The ADDIE model is the standard instructional design framework for developing quality training programmes. It provides a systematic approach to ensure training is effective, relevant, and measurable.

A
ANALYSE

Learning environment, learners' existing knowledge, needs analysis, gap assessment

D
DESIGN

Learning objectives, exercises, content structure, lesson planning, media selection

D
DEVELOP

Create and assemble the content, materials, and resources

I
IMPLEMENT

Deliver the curriculum — method of delivery, testing procedures, actual training

E
EVALUATE

Collect feedback, measure outcomes, refine the programme

Kirkpatrick Model — 4 Levels of Training Effectiveness

Donald Kirkpatrick's four-level model (1959, still the industry standard) provides a framework for evaluating whether training actually achieves its intended purpose. Levels build on each other — you must satisfy Level 1 before Level 2 matters, and so on.

LevelNameWhat is measuredHow measuredQuality context
1 Reaction The degree to which participants find the training favourable, engaging, and relevant to their jobs Post-training surveys, smile sheets, immediate feedback forms Did quality engineers find the SPC training useful and applicable to their work?
2 Learning The degree to which participants acquired the intended knowledge, skills, attitude, confidence, and commitment Pre/post knowledge tests, skill demonstrations, simulations Can engineers now correctly calculate Cpk and interpret control chart signals?
3 Behaviour The degree to which participants apply what they learned when back on the job Observation on the job, supervisor assessments, 90-day follow-up Are engineers actually using SPC charts and reacting to out-of-control signals?
4 Results The degree to which targeted outcomes occur as a result of the training and the support package Business metrics — scrap rate, Cpk improvement, DPMO reduction, COPQ reduction Has the quality of shipped products improved as a result of the SPC training programme?
💡

Most organisations only measure Level 1 (satisfaction surveys) and stop there. True training effectiveness requires measuring Level 4 business results — which is the only way to justify the training investment. For quality engineers, the ROI metric is usually COPQ reduction.

Product & Process Control — Material, Nonconformance & HACCP

Section IV of quality engineers Body of Knowledge covers the practical controls applied during production — from hazard analysis through material identification, segregation, nonconformance handling, and corrective action.

Documentation Hierarchy — Quality System Pyramid

📐 Quality System Documentation Levels
Level 1 — Quality Manual Level 2 — Procedures (high-level process overview) Level 3 — Standard Operating Procedures (SOPs) Level 4 — Work Instructions (step-by-step how) Level 5 — Records (completed forms = evidence of compliance)
Quality Manual

System overview, scope, policy. Not mandatory under ISO 9001:2015 but still widely used.

Procedures

High-level process overview — multi-discipline, does not include detailed "how". Answers WHAT and WHO.

SOPs & Work Instructions

Step-by-step detail of how work is performed. SOPs describe a process; WIs describe a task within a process.

Records

Empty form = document. Filled-in form = record. Records provide evidence of compliance and must be protected from unintended alteration.

HACCP — Hazard Analysis Critical Control Point

HACCP is a systematic preventive approach to food safety. It identifies physical, chemical, and biological hazards in production processes and establishes key limits to reduce these risks. The underlying goal: preventing problems from occurring is better than correcting them after the fact. The term "Critical Control Point" (CCP) is widely borrowed beyond food — it refers to any point where failure of the SOP could cause harm to customers or the business.

#HACCP PrincipleWhat it means
1Hazard AnalysisIdentify all potential hazards (biological, chemical, physical) at each process step
2CCP IdentificationDetermine which steps are Critical Control Points — where control is essential to prevent/eliminate a hazard
3Critical LimitsEstablish the maximum/minimum values (e.g. minimum cooking temperature) that must be met at each CCP
4Monitoring ProceduresDefine how and how often each CCP will be monitored to ensure critical limits are met
5Corrective ActionsSpecify actions to take when monitoring indicates a CCP is not under control
6Verification ProceduresConfirm the HACCP system is working effectively — audits, testing, record reviews
7Record KeepingMaintain documentation of monitoring, deviations, corrective actions, and verification activities
CCP Examples (food industry)
  • 🌡️ Thermal processing — cooking temperature/time
  • ❄️ Chilling — storage temperature control
  • 🧪 Testing ingredients for chemical residues
  • ⚖️ Product formulation control
  • 🔩 Testing product for metal contaminants
💡

A CCP is the "stop sign" of the process — the point where if the control fails, the hazard reaches the customer. Not every process step is a CCP; only those where control is critical to safety or product integrity.

Material Identification, Status & Traceability (ISO 9001:2015 §8.5.2)

Identification

Ability to determine that the specified material grade and size are being used at every stage.

PMI (Positive Material Identification) — mandatory physical test for critical materials (e.g. alloy verification for pressure vessels, pipelines)

Status

Material must be clearly labelled with its current disposition status:

✅ APPROVED — cleared for use
⏳ QUARANTINE — awaiting decision
✗ REJECTED — do not use
Traceability (ISO 9000:2015)

Ability to identify a specific item throughout its life and link it to its Mill Test Report (MTR). Covers: origin of materials and parts, processing history, distribution and location after delivery.

ISO 9001:2015 §8.5.2:
Organisation shall control unique identification of outputs when traceability is required, and retain documented information to enable traceability.

Material Segregation & Classification

Material Segregation

Physical separation of materials to prevent mixing, cross-contamination, or unintended use. Key segregation categories:

  • ✓ Pass / Fail separation at inspection
  • ⏳ Quarantine area — material pending review decision
  • 🏷️ Different material classes (e.g. Carbon Steel vs Stainless Steel — must never mix)
Material Classification — Defect vs Nonconformity
TermISO 9000:2015 definition
NonconformityNon-fulfilment of a requirement. Broader term — includes any deviation from spec, process, or standard.
DefectNonconformity related to an intended or specified use. Defects adversely affect the functionality of the product. All defects are nonconformities, but not all nonconformities are defects.
💡

Use "nonconformity" in contractual/legal contexts (safer). Use "defect" only when the functionality impact is confirmed.

Nonconforming Outputs — ISO 9001:2015 §8.7

§8.7 requires that nonconforming outputs be identified and controlled to prevent unintended use or delivery. The organisation must take action based on the nature and effect of the nonconformity — including after delivery.

§8.7 Disposition optionWhat it means
a) CorrectionRework, repair, or reprocess to make the output conform
b) Segregation / Containment / Return / SuspendPhysically separate, return to supplier, or stop provision of service
c) Inform the customerNotify the customer that nonconforming product may have been delivered
d) Accept under concessionRelease with customer or relevant authority authorisation — documented deviation
💡

After correction, conformity must be re-verified before release. All dispositions must be documented (retain the documented information).

Corrective Action — ISO 9001:2015 §10.2

When a nonconformity occurs (including a complaint), the organisation must react and take corrective action to eliminate the root cause:

StepRequirement
a)React to the nonconformity — contain, correct immediately
b)Evaluate need to eliminate root cause(s) — to prevent recurrence
c)Implement any needed action
d)Review the effectiveness of the corrective action taken
e)Update risks and opportunities if necessary
f)Make changes to the QMS if necessary
💡

Correction vs Corrective Action: Correction fixes the immediate problem (rework). Corrective Action eliminates the root cause (process change) to prevent it recurring. Only CA prevents future occurrences.

Corrective Action Process — Problem Solving Steps

StepActivity
1. Problem IdentificationDefine and quantify the problem clearly — what, where, when, how often, how much
2. Failure AnalysisAnalyse the failure — what failed and how. Reproduce the failure if possible.
3. Root Cause AnalysisIdentify the true root cause — use 5-Why, Fishbone, or fault tree. Address the system, not just the symptom.
4. Problem CorrectionImplement the corrective action — change the process, design, procedure, or training to eliminate the root cause
5. Recurrence ControlImplement controls to prevent recurrence — update FMEA, control plan, WI, training records
6. Verification of EffectivenessConfirm the CA worked — monitor KPIs, check DPMO, audit the new process. Close only when effectiveness is confirmed.
Preventive Action Tools
  • 🔒 Error proofing / Poka-Yoke
  • 🛡️ Robust Design (Taguchi parameter design)
  • 📋 QMS — ISO 9001:2015
  • 📊 FMEA — proactive risk identification
  • 🏭 Lean thinking — 5S, standard work
💡

Correction vs CA vs PA (ISO 9000:2015): Correction = fix this defect now. Corrective Action = eliminate the cause so it doesn't recur. Preventive Action = eliminate the cause of a potential (not yet occurred) problem.

Seven Basic Quality Tools

Introduced by Kaoru Ishikawa in the 1960s, these seven tools form the foundation of quality problem-solving. All Quality Circle members are trained to use them. Together they move a team from raw data collection through root cause identification to ongoing process monitoring.

1

Check Sheet

A structured data-collection form used to manually tally and record the number of observations of specific events. It is the first tool applied — it creates the raw data that feeds every other tool.

When to use: At the start of any investigation. "What is happening, how often, and where?"
Key principle: Design the sheet before collecting data so it captures exactly what you need — category, time, location, shift.
Example — Water Bottle Manufacturing Defect Tally
Defect Scratch Loose Cap Label Volume Leakage Total 300 ml 500 ml 1000 ml Sum || = 2 |||| = 4 | = 1 | = 1 ||| = 3 11 ||| = 3 |||| = 4 || = 2 | = 1 || = 2 12 |||| = 5 |||| = 4 | = 1 | = 1 || = 2 13 10 12 4 3 7 36 Check Sheet → Pareto Chart: Loose Cap (12) is the #1 defect → fix first
2

Cause-and-Effect Diagram

Also called Fishbone or Ishikawa diagram. Graphically displays the relationship between an effect (the problem) and all possible causes, organised by the 6M categories.

6M Categories: Man · Machine · Method · Material · Measurement · Mother Nature (Environment)
Key principle: Qualitative tool — surfaces possible causes, not confirmed causes. Complement with data to validate. Invented by Kaoru Ishikawa (1943).
Fishbone Diagram — Water Bottle Fill Inconsistency
Inconsistent Fill Volume Man Fatigue Training Machine Wear Calibration Method Procedure Speed Material Viscosity Temp Measurement Gauge R&R Resolution Environment Humidity Temp 6M categories branch off the spine → causes branch off each bone
3

Histogram

A bar chart displaying the distribution of measurements — the bars touch (continuous data). Quickly reveals the centre, spread, and shape of the data, providing clues to reducing variation.

Shape patterns to watch: Normal (bell) · Skewed left/right · Bimodal (two peaks — mixing two processes) · Uniform · Comb (measurement resolution too coarse)
Key distinction: Bars touch = continuous data (histogram). Bars separate = categorical data (bar chart). Never confuse the two.
Four Common Shapes — What Each Means
Normal Symmetric · Process centred Skewed Right Long tail right · median < mean Bimodal Two peaks · mixing two processes Uniform All values equally likely Shape of histogram guides what action to take — stratify, investigate mixing, or check centering
4

Pareto Chart

Bars in descending order of magnitude with a cumulative percentage line. Based on the Pareto Principle (80/20 rule): approximately 80% of problems come from 20% of causes.

How to read it: Find where the cumulative % line crosses 80%. The bars to the left of that point are the "vital few" — address these first for maximum impact.
Pro tip: Use Stratification after Pareto — split the Pareto by machine, shift, or operator to reveal which sub-group is driving the top defect.
Pareto Chart — Water Bottle Defects (n=36)
Frequency Cumulative % 20 15 10 5 Loose Cap Label Scratch Leakage Volume 18 8 5 3 2 80% 50% 72% 86% ← Vital Few → ← Trivial Many
5

Scatter Diagram

A plot of one variable against another on an X-Y graph. Reveals the strength and direction of a relationship between two variables. Leads into regression analysis in DMAIC Analyse phase.

⚠️ Correlation ≠ Causation. A strong scatter pattern shows a relationship exists — not that X causes Y. Always ask if a third variable could be driving both.
5 patterns: Strong positive · Weak positive · No relationship · Weak negative · Strong negative
Scatter — Hours Studied vs Test Score (%)
Hours Studied (X) Test Score % (Y) 90 60 30 25 45 60 r = 0.88 (strong positive) · Y = 15.79 + 0.97X
6

Control Chart

A line graph of measurements over time with statistically derived UCL and LCL. The most powerful of the 7 tools. Distinguishes common cause from special cause variation — tells the operator when to act and when to leave the process alone.

The golden rule: Reacting to common cause variation (tampering) makes the process WORSE. Only investigate and act on special cause signals (points outside UCL/LCL, runs, trends).
Special cause signals: Point beyond 3σ · 7-in-a-row same side · 7-in-a-row trend · 2 of 3 in Zone A · Stratification / Hugging
X̄-Chart — Fill Volume (cc) with Special Cause Signal
UCL CL LCL Zone A Zone B Zone C Rule 3: 7-point trend ↑ Tool wear / drift Rule 1: Beyond UCL Subgroup number (time) → React to special cause; never adjust for common cause noise
7

Stratification

Breaking data down into meaningful sub-categories (machine, shift, material, operator, time period) so patterns that are hidden in the combined data become visible.

When to use: When a Pareto or histogram of combined data doesn't explain enough. Ask: "Is this data actually from the same process?" Often the answer is no.
Classic example: Combined Pareto shows "Loose Cap" as #1. After stratification by shift, Shift 2 is responsible for 80% of all loose cap defects — pinpointing where to focus.
Stratification — Combined vs Split by Bottle Size
Combined (All Sizes) 18 8 5 3 2 Cap Label Scratch Leak Vol Stratify! 300ml only 4 3 Cap Scratch Label Leak Vol 1000ml only 5 3 Cap Label Scratch Leak Vol Stratification reveals that Loose Cap pattern differs by bottle size — leads to targeted root cause analysis

Seven Management & Planning Tools

The Seven Management and Planning Tools (7MP / New Seven Tools) complement the Basic 7 by handling qualitative, language-based, and planning data. Where the Basic 7 analyse numbers, the 7MP tools organise ideas, reveal relationships, and plan complex activities. They are particularly powerful in the early stages of DMAIC (Define/Measure) and for strategic planning.

1

Affinity Diagram (KJ Method)

Organises a large number of ideas, opinions, or facts into natural groupings by affinity (similarity). Developed by Japanese anthropologist Kawakita Jiro (KJ). Ideal after brainstorming when you have 20–200+ ideas to make sense of.

Process: Write each idea on a separate sticky note → silently group related ones together → give each group a header card that captures the theme → discuss emerging patterns.
Affinity Diagram — "How to Pass quality engineers Exam"
📚 Content Mastery Detailed coverage Cover the BoK Slides & notes Handbook reference ✏️ Active Practice More quizzes Flash cards Practice problems Past exam papers 🎯 Engagement Style Small videos Interactive To the point Easy to understand 20+ ideas → 3 natural groups via silent affinity grouping
2

Tree Diagram

Breaks down a broad goal into progressively finer levels of detail. Reveals all the activities, tasks, and sub-tasks that must be accomplished to achieve the objective. Also used to show hierarchical structures.

Input to PDPC: The tree diagram becomes the starting structure for a PDPC — the next tool then adds "what could go wrong" and countermeasures to each task node.
Tree Diagram — Passing quality engineers Exam
Certified Exam Motivation Resources Practice Org. support Financial support Binder + Handbook Video course This reference tool Quizzes & mocks Timed practice YouTube free Udemy course Goal → sub-goals → tasks → sub-tasks → actionable activities
3

PDPC — Process Decision Program Chart

Identifies what could go wrong in a plan and develops countermeasures before problems occur. Similar to FMEA for project plans. Starts with a tree diagram and adds risk branches with labelled countermeasures: O = practical, X = impractical.

Key distinction from FMEA: PDPC is for project plans and new initiatives. FMEA is for product/process design. Both are proactive risk tools.
PDPC — Video Course Resource Planning
Certified Exam Video Course Practice Tests Too much irrelevant info Course goes off topic O: Curated playlist X: Skip to next course Not enough questions Poor answer explanations O: Add ASQ mock exam X: Change provider O = Practical countermeasure X = Impractical countermeasure
4

Matrix Diagram

Shows the relationship between two or more groups by arranging them in rows and columns with relationship symbols at intersections. Multiple shapes: L-shaped (2 groups), T-shaped (3 groups), Roof-shaped (1 group vs itself — used in House of Quality).

QFD connection: The House of Quality uses an L-shaped matrix (VOC vs engineering requirements) and a roof-shaped matrix (engineering requirement interactions). The roof is a matrix diagram.
L-Shaped Matrix — Product vs Customer Criteria (1=weak, 5=strong)
Criteria Product 1 Product 2 Product 3 Product 4 Efficiency (0.3) Look (0.4) Comfort (0.2) Pickup (0.1) TOTAL (weighted) 2 1 2 1 1.5 3 2 1 3 2.2 5 5 4 4 4.7 ✓ 2 4 2 5 3.1 Product 3 wins on weighted criteria — objective, transparent decision-making
5

Interrelationship Digraph

Analyses cause-and-effect relationships between multiple factors in a complex situation. Unlike fishbone (one effect), the digraph handles multiple interconnected causes and effects simultaneously — ideal for chronic, systemic quality problems.

Reading the diagram: Count arrows in and out. Most outgoing arrows = root cause (driver). Most incoming arrows = key effect (outcome indicator). Focus improvement on root causes.
Node with most outgoing arrows is usually the best leverage point for change.
Interrelationship Digraph — Poor Quality (In/Out count shown)
Lack of Mgmt Support OUT: 4 → ROOT CAUSE No Training Out:2 / In:2 No Calibration Out:1 / In:2 No Maintenance Out:1 / In:2 No Procedures Out:2 / In:2 Poor Quality Outcome IN: 4 → OUTCOME
6

Prioritisation Matrix

Compares and ranks choices against weighted criteria to select the best option objectively. Removes subjectivity from project selection, supplier choice, or design decisions. Each criterion has a weight (sum to 1.0), and each option is rated 1–5 against each criterion.

Formula per cell: Rating × Weight. Sum all weighted cells for each option — highest total wins. The weighting step is what differentiates this from a simple score matrix.
Prioritisation Matrix — Quality Improvement Project Selection
Criteria (Weight) Project A Project B Project C Project D Cost savings (0.4) Ease of impl. (0.3) Customer impact (0.2) Risk level (0.1) TOTAL 3×0.4 = 1.2 4×0.3 = 1.2 2×0.2 = 0.4 3×0.1 = 0.3 3.1 2×0.4 = 0.8 5×0.3 = 1.5 3×0.2 = 0.6 5×0.1 = 0.5 3.4 5×0.4 = 2.0 4×0.3 = 1.2 4×0.2 = 0.8 3×0.1 = 0.3 4.3 ✓ 1×0.4 = 0.4 3×0.3 = 0.9 5×0.2 = 1.0 4×0.1 = 0.4 2.7
7

Activity Network Diagram (CPM / PERT)

Manages tasks in sequence to identify the critical path, bottlenecks, and float (slack). The Critical Path Method (CPM) finds the longest sequence of dependent tasks — delays on the critical path delay the entire project.

Float (Slack): Amount a task can slip without delaying the project. Activities on the critical path have zero float. Non-critical activities have positive float — they can slip without affecting the end date.
PERT extends CPM by using probabilistic time estimates: Expected Time = (Optimistic + 4×Most Likely + Pessimistic) / 6
Activity Network — Critical Path Highlighted (Red = Critical, Green = Float available)
START Day 0 A: 2d Requirements B: 4d Design C: 1d Review D: 2d Build E: 7d Test & Deploy END Day 15 C has 4d float (slack) Critical Path: Start → A → B → D → E → End = 15 days (zero float) Red = Critical path (no slack). Green = Non-critical (can slip up to float days without delaying project).
Statistical Process Control

Statistical Process Control

SPC is manufacturing's early-warning system — detecting real process shifts before they become defects, while distinguishing true signals from random noise.

Cp, Cpk, Pp, Ppk — The Capability Family

Capability indices answer two separate questions: "Can the process fit within spec?" (Cp) and "Is it actually centred there?" (Cpk). The gap between them is your centering loss.

📊 Centered (Cp = Cpk) vs Off-Center (Cp > Cpk) — Same Process Spread
✓ Centred — Cp = Cpk = 1.33 LSL USL Cpk = 1.33 μ centred · Cpk = Cp −3σ −2σ −1σ 0 +1σ +2σ +3σ 68.3% 13.6% 13.6% 2.1% 2.1% 6σ process spread 99.73% within ±3σ ✗ Off-centre — Cp = 1.33, Cpk = 0.89 LSL USL μ (shifted +1.33σ) Cpk = 0.89 closer to USL → defects 2.67σ → Cpk = 0.89 5.33σ from LSL DEFECTS beyond USL −3σ −2σ −1σ μ +1σ +2σ same 6σ spread — shifted right tail beyond USL → real defects
Cp — Potential (short-term)
Cp = (USL − LSL) ÷ 6σ
Assumes perfect centering. Answers: can the process fit?
Cpk — Actual (centred?)
Cpk = min[(USL−µ)/3σ, (µ−LSL)/3σ]
Accounts for centering. Answers: is it centered there?
Pp — Long-term Potential
Pp = (USL − LSL) ÷ 6s
Uses sample std dev s (not σ). Long-term spread.
Ppk — Long-term Actual
Ppk = min[(USL−x̄)/3s, (x̄−LSL)/3s]
Overall performance including all variation sources.

Cpk Acceptance Thresholds

Cpk < 1.0
Incapable ✗
Cpk = 1.00
Marginal
Cpk = 1.33
Capable ✓
Cpk = 1.67
Highly Capable
Cpk ≥ 2.0
World Class
💡

Large Cp − Cpk gap? Fix centering first — not spread reduction. If Cp ≥ 1.33 but Cpk < 1.33, the process is capable of meeting spec but is running off-target. Adjust the mean before spending on variation reduction.

📋 Cpk Targets by Char. Type

CharacteristicMin CpkInitial
CC (Critical) ongoing≥ 1.67
SC (Significant) ongoing
General process control≥ 1.33≥ 1.33
Cpk → Defect Relationship
CpkDPMO (approx)
1.002,700
1.33
1.506.8
1.670.57
2.000.002

Control Chart Selection Guide

The right chart depends on two things: data type (measured value vs pass/fail count) and subgroup size. Using the wrong chart gives misleading signals.

📊 Which Control Chart? — Decision Tree
What kind of data? Variables (measured values) Attribute (counts / pass-fail) Subgroup size n? Defective or Defects? n = 1 n = 2 – 8 n > 8 Defective items Defects per unit I–MR Chart Individuals & Moving Range Subgroup size = 1 X̄–R Chart Most common · automotive Subgroup size 2–8 X̄–S Chart More efficient for large n Subgroup size > 8 p / np Chart Proportion / count defective p: variable n · np: fixed n c / u Chart Defects per unit / area c: fixed area · u: variable ★ Start here for most manufacturing processes

X̄ chart monitors process mean (location); R chart monitors within-subgroup spread. Uses constants A₂, D₃, D₄ from standard tables. Best for rational subgroups of 2–8. Most common in PPAP control plans and IATF 16949 production environments.

Used when one measurement per cycle is all that's available: slow processes, destructive testing, chemical batches, daily lab results. Less sensitive to small shifts than X̄–R. Moving Range tracks point-to-point variation.

p-chart: proportion defective (variable subgroup size). np-chart: count defective (constant n). Both use the binomial distribution. Foundation of attribute acceptance sampling plans.

c-chart: total defect count per unit (constant inspection area). u-chart: defects per unit (variable inspection area). Both based on the Poisson distribution. Examples: scratches per panel, solder defects per board, paint runs per door.

Western Electric / Nelson Out-of-Control Rules

These 8 patterns on a control chart each indicate a special cause of variation — something changed in the process. Any single rule triggering is sufficient grounds for investigation. Each chart below shows the rule in isolation with real-scale control chart zones.

📊 Rule 1 — 1 Point Beyond ±3σ  Sudden shift / special event / measurement error
UCL +3σ CL (mean) LCL −3σ ! SIGNAL ▲ beyond UCL
📊 Rule 2 — 9 Consecutive Points Same Side of Centreline  Process mean shift / new material lot / operator change
UCL +3σ CL (mean) LCL −3σ 9th! ← 9 consecutive points above centreline →
📊 Rule 3 — 6 Consecutive Points Trending Up or Down  Tool wear / gradual drift / raw material degradation
UCL +3σ CL (mean) LCL −3σ ← 6 consecutive increases →
📊 Rule 4 — 14 Consecutive Points Alternating Up/Down  Two processes alternating / overadjustment / tampering
UCL +3σ CL (mean) LCL −3σ 14 consecutive alternating — sawtooth pattern signals two streams mixing or tampering
📊 Rule 5 — 2 of 3 Consecutive Points Beyond ±2σ (Same Side)  Process shift starting / incoming material lot change
2 of these 3 beyond +2σ → SIGNAL UCL +3σ +2σ CL (mean) −2σ LCL −3σ
📊 Rule 6 — 4 of 5 Consecutive Points Beyond ±1σ (Same Side)  Systematic bias / gradual process drift
4 of these 5 above +1σ → SIGNAL +3σ UCL +2σ +1σ CL -2σ -3σ LCL
📊 Rule 7 — 15 Consecutive Points Within ±1σ  Stratification / incorrect subgrouping / mixed streams
+3σ UCL +2σ ±1σ zone (CL ± 1σ) −2σ −3σ LCL 15 consecutive within ±1σ — limits too wide or two streams mixed in subgroups
📊 Rule 8 — 8 Consecutive Points Beyond ±1σ (Both Sides)  Mixture of two different process distributions / bimodal output
+3σ UCL +2σ CL — never visited! −2σ −3σ LCL 8 consecutive points outside ±1σ on both sides — no point near the centreline signals bimodal / mixture
#RuleSignal ConditionWhat it usually means
1Beyond ±3σ1 point outside control limitsSudden shift, special event, measurement error
29 same side9 consecutive points all above or all below CLProcess mean shift, new lot, operator change
36 trend6 consecutive points all increasing or all decreasingTool wear, gradual drift, raw material degradation
414 alternating14 consecutive points alternating up/downTwo processes alternating, overadjustment/tampering
52 of 3 beyond ±2σ2 of 3 consecutive beyond ±2σ same sideProcess shift starting, material lot change
64 of 5 beyond ±1σ4 of 5 consecutive beyond ±1σ same sideSystematic bias, gradual drift
715 within ±1σ15 consecutive points all within ±1σ of CLStratification — mixed streams in subgroups, limits too wide
88 beyond ±1σ both sides8 consecutive points outside ±1σ (above and below)Bimodal / mixture of two process distributions
💡

Common cause vs special cause. Control charts separate random noise (common cause — inherent system variation) from assignable events (special cause — investigate and fix). Reacting to common cause variation is tampering — it adds variation. Rules 1 and 2 are the most practically important: use them always. Rules 5–8 add sensitivity but also false alarms — apply them when the cost of missing a shift is high.

Capability Analysis — The Complete Framework

Capability analysis answers a single fundamental question: can this process reliably produce output that meets customer requirements? It does so by fitting a statistical model to process data and estimating the probability of producing nonconforming product — now and in the future. Before any capability number is trustworthy, however, three conditions must hold: the process must be stable, the data must be approximately normal, and there must be enough observations for the statistics to carry real precision. Failing any one of these makes the resulting Cpk figure meaningless.

📐

Two types of capability study: A single-variable capability analysis evaluates one CTQ characteristic against its specification limits. A before/after capability comparison determines whether a process improvement project produced a measurable, statistically confirmed improvement in capability — not just noise.

📊 Three Prerequisites Before Computing Any Capability Index
① Stability Process must be in statistical control. Verify via I-MR or X̄-R/S chart first. Unstable → find & remove special causes ② Normality Data must follow a normal (or near-normal) distribution. Test with Anderson-Darling. Non-normal → Box-Cox transform or non-normal capability ③ Sufficient Data Minimum 100 observations (preferred). Absolute minimum: 30 (Bothe, 1997). 100 obs → 90% CI within ±15% of true Z

① Process Stability — The First Gate

Capability statistics estimate a future defect rate, not just a historical snapshot. That projection is only valid if the process is operating in a stable, predictable state — meaning only common-cause variation is present and no special causes are inflating or shifting the output. A capability study on an unstable process produces a number that describes a process that no longer exists.

The eight Western Electric stability tests are available for variables control charts, but using all eight simultaneously drives up the false-alarm rate. Research comparing sensitivity and false-alarm behaviour identified three tests that give the best balance for capability pre-screening:

TEST 1 — Always Used

Point Beyond Control Limits

Signals when any single point lies more than 3 standard deviations from the centreline. Universally recognised as the primary out-of-control signal. False alarm rate: 0.27% — the baseline for all other tests. Applied to all chart types: I-MR, X̄-R/S.

Signal: 1 point > ±3σ from CL · FAR=0.27%
TEST 2 — Detects Mean Shifts

9 Consecutive Points, One Side

Signals when 9 successive points all fall on the same side of the centreline. Simulation showed that combining Test 2 with Test 1 reduces the average subgroups needed to detect a 0.5σ mean shift from 154 to just 57 — a 63% improvement in detection speed. Applied to I-chart and X̄-chart only.

Signal: 9 pts same side · Detects small shifts
TEST 7 — Detects Stratification

12–15 Points Within ±1σ

Signals when an unusual number of consecutive points cluster within ±1σ of the centreline — the opposite of what Test 1 catches. This pattern reveals stratification: multiple distinct populations mixed into a single subgroup (e.g. two machines sampled together). Used only on the X̄-chart when limits are estimated from data.

k = subgroups × 0.33 · min 12, max 15 pts
k = (Subgroups × 0.33)Points in a row required for Test 7 signalWhat it means
k < 12Use fixed minimum — too few subgroups for adaptive rule
12 ≤ k ≤ 15Adaptive: scale with data volume for balanced sensitivity
k > 15Cap at maximum — prevents excessive false alarms with large datasets
⚠️

Tests 3, 4, 5, 6, and 8 are excluded from pre-capability screening. Tests 3 (trends) and 4 (alternating) add no unique detection power over Tests 1+2. Tests 5, 6, and 8 don't isolate special cause patterns common enough to justify their false-alarm cost. For the R, S, and MR charts (spread charts), only Test 1 is applied — extreme spread points are the only practically relevant signal.

② Normality Testing — The Anderson-Darling Approach

Standard capability indices (Cp, Cpk, Pp, Ppk) are derived from the normal distribution. They convert a Z-score — the number of standard deviations between the process mean and the nearest specification limit — into a defect probability using the normal CDF. If the process data doesn't follow a normal distribution, those Z-to-DPMO conversions are wrong, and every capability index based on them is wrong.

The Anderson-Darling (AD) test is the preferred normality test for capability pre-screening. Compared to other goodness-of-fit tests, the AD test has higher statistical power — especially in the tails of the distribution, which is precisely where capability defects occur. The concern that the AD test becomes overly strict with large samples is not supported by simulation evidence: across sample sizes from 500 to 10,000, and across normal populations with varying spreads, the Type I error rate consistently tracks the target significance level (≈5% at α=0.05).

📊 Normality Assessment Decision Flow
Run AD Test on data p-value ≥ 0.05? YES — passes Proceed with Normal capability NO — fails Box-Cox feasible? YES Transform → re-test AD If p≥0.05: use transformed data NO Use non-normal capability User confirms transform → capability on λ-transformed data
AD Test — Why it beats alternatives

The AD test accumulates squared deviations between the empirical CDF and the theoretical normal CDF with extra weight given to the tails. Since nearly all capability defects occur in the tails, this weighting is exactly what's needed. The Kolmogorov-Smirnov test applies equal weight throughout the distribution — it can miss tail problems that dominate capability.

Box-Cox Transform — When normality fails

The Box-Cox power transformation x → (xλ−1)/λ can often convert a moderately skewed distribution into an approximately normal one. The optimal λ is found by maximum likelihood. Once transformed data passes the AD test, capability indices are computed on the transformed scale. The Cpk result describes performance on the original scale after back-transformation.

AD Test Simulation Evidence — Type I Error & Power

Extensive simulation work confirmed two important properties of the AD test for capability analysis contexts:

PropertyWhat was testedResultPractical meaning
Type I Error 5,000 samples from normal populations (σ=0.1 to 70) at n=500 to 10,000 ≈ 5% rejection rate at α=0.05 — consistent across all sample sizes and dispersions The AD test does not become overly strict with large datasets — a common practitioner concern that simulation disproved
Power (correct rejection) 5,000 samples from 17 non-normal distributions (t, Laplace, Uniform, Beta, Gamma, Weibull, etc.) ≈ 100% rejection for nearly all distributions at n≥500 If your data isn't normal, the AD test will detect it — with one exception
Power exception Beta(3,3) at n<1000; Weibull(4,4) at n<3000 These distributions are visually indistinguishable from normal — a normal capability model provides a good approximation and produces reliable estimates

③ How Much Data Do You Actually Need?

The required sample size depends on two things: the true capability of your process and the precision you need from the estimate. These are connected — at high sigma levels, even rough estimates of Z (±15%) translate into a range of DPMO values that is practically acceptable. At lower sigma levels, the same ±15% range spans thousands of DPMO, which may be unacceptable for decision-making.

The AIAG SPC reference manual recommends at least 25 rational subgroups and a minimum of 100 total measurements. Independent simulation work generating 10,000 benchmark-Z estimates at each sample size confirmed this guidance:

Confidence LevelPrecision marginTarget Z > 3 (typical capable process)Target Z ≈ 2.5 (marginal process)
90%±15% of true Z~100 observations
90%±10% of true Z~175 observations~215 observations
90%±5% of true Z~650 observations~750 observations
95%±15% of true Z~150 observations~175 observations
95%±10% of true Z~200 observations~250 observations
💡

The 100-observation rule explained: With 100 measurements from a process where Z>3, you can be 90% confident that your computed benchmark Z lies within ±15% of the true Z. For a truly 6σ process (Z=4.5 long-term), that confidence interval spans roughly Z=3.8 to Z=5.2. Doubling to 175 observations tightens this to ±10%. For most industrial go/no-go decisions, 100 measurements is sufficient; for precise capability reporting in supply chain audits, target 175+.

Why Precision Matters More at Lower Sigma Levels

True ZTrue DPMO±15% precision → Z rangeDPMO range at ±15%Practical impact
4.5σ3.4 3.83 – 5.18 0.9 – 13.3 DPMO Acceptable — the difference between 1 and 13 defects per million is rarely decision-critical
2.55 – 3.45 Significant — a 19× DPMO range makes pass/fail decisions unreliable
2.5σ6,210 2.13 – 2.88 1,970 – 16,400 DPMO Unacceptable for reporting — increase sample size to ≥200 before drawing conclusions

The Recommended Capability Study Sequence

  1. 1

    Define CTQ and specification limits

    Confirm LSL/USL are customer-driven, not internally tightened. Incorrect spec limits make all downstream analysis meaningless.

  2. 2

    Validate the measurement system (GR&R)

    If the gauge R&R exceeds 30% of process variation, the capability index will be systematically underestimated. Fix measurement before measuring capability.

  3. 3

    Plot an I-MR or X̄-R/S control chart

    Run Tests 1, 2, and 7 (for X̄). Remove special causes before continuing. Do not compute Cpk from an unstable process.

  4. 4

    Test for normality with Anderson-Darling

    If p<0.05, attempt Box-Cox transformation. If transformation fails, use non-normal capability methods (e.g. Weibull capability, non-parametric percentile approach).

  5. 5

    Collect at least 100 observations

    Fewer than 30: do not report Cpk. 30–99: flag as preliminary. 100+: acceptable for capability reporting. 175+: preferred for formal PPAP submission.

  6. 6

    Compute and report Cp, Cpk, Pp, Ppk

    Report confidence intervals alongside the point estimates. A Cpk of 1.35 with a 95% CI of [1.10, 1.62] tells a very different story than just "1.35".

  7. 7

    Interpret with AIAG thresholds — but don't stop there

    Cpk ≥ 1.67 (initial CC), Cpk ≥ 1.33 (ongoing). Always pair the capability index with a probability plot, histogram overlay, and DPMO estimate. Never report a number without context.

Before/After Capability Comparison — Verifying Improvement

A before/after capability comparison is used at the end of a DMAIC Improve phase to confirm that an improvement action produced a real, statistically significant improvement — not a random fluctuation. The same three prerequisites apply to both datasets independently. Key considerations:

Valid comparison requires
  • ✓ Both datasets from stable processes (independently verified)
  • ✓ Both datasets passing normality (or same transformation applied)
  • ✓ Minimum 100 observations in each group
  • ✓ Same measurement system used for both (GR&R unchanged)
  • ✓ Statistical significance test on Cpk difference (use non-central F)
Common before/after errors
  • ✗ "After" data collected during unstable trial run (Hawthorne effect)
  • ✗ Sample sizes too small to detect a meaningful Cpk improvement
  • ✗ Gauge R&R changed between before and after studies
  • ✗ Declaring success from point estimates alone — use confidence intervals
  • ✗ Not waiting long enough for the "after" data to represent steady-state

Summary rule: Stability → Normality → Sufficient data. In that order, with no shortcuts. A Cpk computed without verifying all three is a number without a foundation. Compute it if you must, but flag it clearly as unvalidated and treat it as indicative only — never as a basis for a PPAP approval or a customer capability commitment.

Nelson Rules — All 8 Rules with Probabilities & Causes

Nelson Rules (also called Western Electric / Shewhart Rules) detect special cause variation. Each rule has a known false-alarm probability — this is the probability it triggers even when the process is in statistical control (i.e., common cause variation only).

#PatternFalse alarm probabilityProbable special cause
11 point more than from centreline(1−0.9973) = 0.0027New operator, wrong setup, measurement error, out-of-spec material
27 points in a row on the same side of the centreline(0.5)⁷ = 0.0078Process mean has shifted — setup change, tool wear, material batch change
37 points in a row all increasing or all decreasing0.0017Trend — tool wear, gradual deterioration, temperature drift
414 points in a row alternating up and down0.0002Over-control / tampering — operator adjusting too frequently
52 out of 3 consecutive points more than from centreline (same side)0.003New operator, wrong setup — similar to Rule 1 but detects smaller shifts
64 out of 5 consecutive points more than from centreline (same side)0.005Small sustained shift in process mean
714 points in a row within from centreline (either side)(0.68)¹⁴ = 0.0045Process improvement, reduced variation, or stratified sampling mixing two distributions
88 points in a row more than from centreline (either side)(1−0.68)⁸ = 0.0001Mixture of two processes — two machines, two shifts, or two operators being combined
Zone Labels (σ bands)

The control chart is divided into zones from the centreline outward:

  • Zone C — within 1σ of centreline (≈68% of points here)
  • Zone B — between 1σ and 2σ from centreline (≈27%)
  • Zone A — between 2σ and 3σ from centreline (≈4.3%)
  • Beyond 3σ — outside control limits (≈0.27%)
Practitioner Tips
  • ✓ Rule 1 (beyond 3σ) is always the most obvious — the simplest special cause signal
  • ✓ Rule 2 (7-in-a-row same side) is the most common exam scenario — a mean shift
  • ✓ Rule 4 (alternating 14 points) = over-control. The fix is to stop adjusting.
  • ✓ Rule 7 (hugging centreline) = artificially low variation, often from stratified subgroups mixing two processes
  • ✓ False alarm rate multiplies with each rule added — more rules = more false alarms

Process Capability Indices — Complete Reference

Capability indices quantify how well a process fits within its specification limits. The family of indices (Cp, Cpk, Pp, Ppk) each answer a slightly different question. Understanding when to use which index — and the conditions that must be met — is heavily tested in the engineering practice.

Short-Term Capability Indices (Within σ)

IndexFormulaWhat it measuresLimitation
Cp Cp = (USL − LSL) / (6·σwithin) Potential capability — how wide the spec is relative to the process spread. Ignores centring. A high Cp with a poorly centred process will still produce defects
CpL CpL = (X̄ − LSL) / (3·σwithin) Lower capability — distance from mean to lower spec in σ units One-sided; use when only a lower limit matters
CpU CpU = (USL − X̄) / (3·σwithin) Upper capability — distance from mean to upper spec in σ units One-sided; use when only an upper limit matters
Cpk Cpk = min(CpL, CpU) Actual short-term capability — accounts for both spread and centring. The most commonly used index. If process is perfectly centred, Cpk = Cp
Cr Cr = 1/Cp = 6σ/(USL−LSL) Capability ratio — percentage of tolerance used by the process. Cr × 100 = % tolerance consumed. Lower is better; Cr < 1.0 means Cp > 1.0

Long-Term Performance Indices (Overall σ)

IndexFormulaKey difference from Cp/Cpk
PpPp = (USL − LSL) / (6·σoverall)Uses overall (total) standard deviation — includes all sources of variation over time (between-subgroup + within-subgroup)
PpLPpL = (X̄ − LSL) / (3·σoverall)Long-term indices; Ppk ≤ Cpk always. The gap between Cpk and Ppk indicates how much the process mean has drifted or shifted over time.
PpUPpU = (USL − X̄) / (3·σoverall)
PpkPpk = min(PpL, PpU)
Cp vs Cpk vs Pp vs Ppk — Summary
Short-term (within σ)Long-term (overall σ)
Potential (centring ignored)CpPp
Actual (centring included)CpkPpk
💡

If Cpk ≈ Ppk: the process is stable over time. If Cpk >> Ppk: the process has shifted or drifted — investigate between-subgroup variation.

Capability vs Rejection Rates
Cp / CpkSigma levelRejection rate
1.000.27% (2,700 ppm)
1.3364 ppm
1.670.6 ppm
2.002 ppb
💡

Four conditions required: (1) sample represents population, (2) data is normally distributed, (3) process is in statistical control, (4) sample size is sufficient.

Short-Run SPC — Monitoring Low-Volume Production

A typical control chart needs 20–25 subgroups (≈100 data points) to establish reliable control limits. Short-run SPC solves the problem of low-volume or mixed-part production where insufficient data exists for traditional charts.

The Problem

When producing different-diameter items (e.g. 300mm, 400mm, 500mm) in small runs of 8 each, options are:

  • ✗ 100% inspection — expensive
  • ✗ First-off inspection only — misses process variation
  • ✗ Last-off inspection — too late to react
  • ✗ Separate chart per part — too little data per chart
  • ✓ Short-run chart — plots all parts on one chart by transforming the data

Key Principle

Short-run SPC focuses on the process, not the product. By transforming raw measurements, parts with different nominal values can be plotted on a single chart — revealing process stability across multiple part numbers.

💡

Only valid if the different part runs have similar variance. If variance differs significantly between parts, a Z-MR chart (standardised) is needed instead.

Two Short-Run Chart Methods

Difference Chart (similar variance)

Subtract the nominal value for each run. Plot the deviations on a standard I-MR chart.

Difference = Actual − Nominal
Run A nominal = 300 → 302.6 − 300 = 2.6
Run B nominal = 500 → 504.2 − 500 = 4.2
Run C nominal = 400 → 400.5 − 400 = 0.5
→ Plot all differences on one I-MR chart
Z-MR Chart (different variance between runs)

Standardise each measurement using the run's own mean and standard deviation. The Z score is plotted — chart limits are always ±3 regardless of part.

Z = (Xi − X̄ᵣᵤₙ) / σᵣᵤₙ
UCL = +3, LCL = −3 always
CL = 0 always
→ All parts, all runs, one chart

Cpk vs Ppk — Real-World Scenario with Full Worked Example

Cpk and Ppk look similar on paper but measure fundamentally different things. Cpk measures what your process can do when it's running well. Ppk measures what it actually does over extended time — including every shift change, raw material lot, and seasonal temperature swing. The gap between them tells a story about process management, not just process performance.

The Fundamental Difference

Cpk — Short-Term / Within
What the process can do

Uses σWithin — estimated from within-subgroup variation only. Strips out the noise from subgroup-to-subgroup shifts. Represents the process at its best, as if operating under one stable short-term condition.

Cpk = min[(USL−X̄)/(3·σW), (X̄−LSL)/(3·σW)]
Ppk — Long-Term / Overall
What the process actually does

Uses σOverall — the plain sample standard deviation across all observations. Includes every source of variation: within-subgroup, between-subgroup, drift, shift, operator, raw material. The real-world performance index.

Ppk = min[(USL−X̄)/(3·σO), (X̄−LSL)/(3·σO)]
How the Two Sigmas Are Computed (from Minitab Technical Documentation)
σWithin — Strips drift out
σ̂W = sp / c₄(d)
sp = √[Σ(xij−x̄i)² / Σ(ni−1)]
Pooled std dev across subgroups (default)

or: σ̂W = MR̄ / d₂(w)
Average moving range when n=1
σOverall — Includes everything
σ̂O = s / c₄(n)
s = √[ΣiΣj(xij−x̄)² / (n−1)]
Plain sample std dev, all data pooled

σ̂O ≥ σ̂W always
∴ Ppk ≤ Cpk always

Visual: Why Cpk > Ppk When the Process Drifts

📊 Short-term vs Long-term variation — how mean drift inflates σOverall
LSL USL Shift 1 Shift 2 Shift 3 σWithin σOverall (much wider) Process output → Short-term (Cpk uses σWithin) Long-term (Ppk uses σOverall) σ Overall > σ Within ∴ Ppk < Cpk always, when process drifts between shifts

Each narrow blue curve is a subgroup's short-term behaviour — tight, capable, well within spec. But when all three shifts combine into the long-term picture (red dashed curve), the overall spread is much wider. This is why Ppk ≤ Cpk always. The gap is not measurement error — it's process management information.

Worked Example — Automotive Fuel Injector Flow Rate

The Scenario

A fuel injector flow rate must meet LSL = 195 cc/min, USL = 205 cc/min (tolerance = 10 cc/min). You run a production study: 25 subgroups of n=5 collected over 3 production shifts across 5 days. The process uses Rbar to estimate σWithin.

Capability Study Results
Grand mean X̄̄
= 200.8 cc/min
Average range R̄
= 2.34 cc/min
d₂(n=5)
= 2.326
c₄(n=5)
= 0.9400
Overall s
= 1.62 cc/min
n total
= 125 observations
Step 1 — Compute Both Sigmas
σ̂W = R̄ / d₂(5) = 2.34 / 2.326
   = 1.006 cc/min

σ̂O = s / c₄(5) = 1.62 / 0.9400
   = 1.723 cc/min
Step 2 — Compute Cpk (short-term)
CPU = (205 − 200.8) / (3 × 1.006)
    = 4.2 / 3.018 = 1.392
CPL = (200.8 − 195) / (3 × 1.006)
    = 5.8 / 3.018 = 1.922
Cpk = min(1.922, 1.392) = 1.39
Step 2 — Compute Ppk (long-term)
PPU = (205 − 200.8) / (3 × 1.723)
    = 4.2 / 5.169 = 0.812
PPL = (200.8 − 195) / (3 × 1.723)
    = 5.8 / 5.169 = 1.122
Ppk = min(1.122, 0.812) = 0.81
📋 Reading the Results — What Cpk=1.39, Ppk=0.81 Actually Means
Cpk = 1.39 (✓ Good)
When this process runs stably within a shift, it is capable — the machine can hit spec consistently. The process potential meets the standard for ongoing production (Cpk ≥ 1.33).
Ppk = 0.81 (✗ Poor)
Over the 5-day study, the process is not capable. The large gap (Cpk − Ppk = 0.58) reveals significant between-shift or between-day variation — likely from warm-up drift, operator differences, or raw material lot variation.
💡

The engineering decision: Do not report only Cpk. A customer seeing Cpk = 1.39 would approve the PPAP. But Ppk = 0.81 tells the real story — this process will produce field defects at rates far above what Cpk predicts. The correct action is to investigate the source of between-shift variation, fix it, then re-run the study with both indices reporting ≥ 1.33.

Interpreting the Cpk–Ppk Gap

Cpk vs Ppk PatternWhat it MeansTypical Root CauseAction
Cpk ≈ Ppk (gap < 0.1)Process is stable over timeNo significant between-subgroup drift. What you see in a short run is what you get long-term.Report Ppk to customer. No additional investigation needed.
Cpk moderately > Ppk (gap 0.1–0.3)Some long-term drift presentGradual tool wear, ambient temperature, material lot variation. Process is capable but not perfectly controlled.Investigate between-subgroup sources. Tighten control plan.
Cpk significantly > Ppk (gap > 0.3)Serious stability problemShift changes, operator methods, machine warm-up, batch material variation. Multiple distinct process streams being reported as one.Do not submit this PPAP. Conduct MSE (Multi-Stream Evaluation). Stratify data by suspected source.
Ppk > CpkUnusual — investigateWithin-subgroup variation is inflated (e.g. too much between-part variation sampled in one subgroup — irrational subgrouping).Review subgrouping strategy. Rational subgroups should represent only short-term common-cause variation.

Confidence Intervals — Never Report a Point Estimate Alone

A Cpk of 1.33 computed from 30 samples has a very different meaning than the same value from 200 samples. Confidence intervals quantify this uncertainty. These formulas are from the Minitab capability analysis documentation.

Cp — χ² based CI
Lower = Ĉp · √(χ²1−α/2,ν / ν)
Upper = Ĉp · √(χ²α/2,ν / ν)
ν = fn · k(n−1)
Cpk — Normal approximation CI
Lower = Ĉpk − Zα/2√(1/9kn + Ĉpk²/2ν)
Upper = Ĉpk + Zα/2√(1/9kn + Ĉpk²/2ν)
k = subgroups, n = avg size
Pp — χ² based CI (overall)
Lower = P̂p · √(χ²1−α/2, kn−1 / (kn−1))
Upper = P̂p · √(χ²α/2, kn−1 / (kn−1))
Ppk — Normal approximation CI
Lower = P̂pk − Zα/2√(1/9kn + P̂pk²/2(kn−1))
Upper = P̂pk + Zα/2√(1/9kn + P̂pk²/2(kn−1))
📌

Applied to our example (Cpk = 1.39, k=25, n=5): The 95% CI for Cpk is approximately [1.15, 1.63]. This means we cannot be certain the true Cpk exceeds 1.33 — it might be as low as 1.15. This is why 125 observations is borderline for formal PPAP submission; aim for 175+ to tighten the CI.

σ Estimation Methods — Which Formula Does Your Software Use?

Cpk and Ppk use different sigma estimates, and within each, there are multiple methods depending on subgroup size and data structure. Understanding which formula applies to your situation prevents misinterpretation — especially when comparing indices across software platforms.

Overview — When Each Method Applies

Sigma TypeMethodWhen UsedUsed for
σWithin
Short-term
Used in Cp, Cpk
Pooled Std DevSubgroup size n > 1 (default)Cp, Cpk, UCL/LCL on X̄ chart
Rbar (Average Range)Subgroup size n > 1, alternative methodX̄-R charts — traditional method
Average Moving Range (MR̄)Subgroup size n = 1 (default)I-MR charts — individual measurements
σOverall
Long-term
Used in Pp, Ppk
Sample Std DevAll scenariosPp, Ppk — always this formula

σOverall — The Long-Term Standard Deviation

Always the plain sample standard deviation across all observations, corrected by the c₄ unbiasing constant. This is the denominator for Pp and Ppk.

Formula (Minitab default)
σ̂Overall = s / c₄(n)

s = √[ ΣiΣj(xij − x̄)² / (n − 1) ]

where n = total observations, x̄ = grand mean across all data
c₄(n) → 1 as n → ∞ (correction negligible for n > 50)
⚠️

σOverall includes all sources of variation: within-subgroup + between-subgroup + drift + shift + any systematic effects. It is always ≥ σWithin, which is why Ppk ≤ Cpk always.

σWithin Method 1 — Pooled Standard Deviation (Default, n > 1)

The default method when subgroup size > 1. Pools variance across all subgroups, then applies the c₄ unbiasing constant. This is what Minitab and most SPC software use by default.

Pooled Standard Deviation Formula
σ̂Within = sp / c₄(d)

sp = √[ ΣiΣj(xij − x̄i)² / Σi(ni−1) ]

d = Σ(ni−1) + 1    (degrees of freedom)

When subgroup size is constant: sp = √(Σsi² / k), d = n − k + 1

σWithin Method 2 — Rbar (Average Range, n > 1)

The traditional control chart method — divides the average range by the d₂ constant. Used on X̄-R charts. Equivalent to pooled std dev when subgroup size is constant, but less efficient for unequal subgroup sizes.

Rbar Formula (equal subgroup sizes)
σ̂Within = R̄ / d₂(ni)

R̄ = (R₁ + R₂ + ... + Rk) / k

Unequal subgroup sizes: uses weighted formula fi = [d₂(ni)]² / [d₃(ni)]²

σWithin Method 3 — Average Moving Range (Default, n = 1)

When individual measurements are collected (subgroup size = 1), within-subgroup variation is estimated from consecutive differences — the moving range. This is the I-MR chart approach.

Average Moving Range Formula (w=2, default)
σ̂Within = MR̄ / d₂(w)

MRi = |xi − xi−1|    (for w=2, consecutive pairs)

MR̄ = (MR2 + MR3 + ... + MRn) / (n − w + 1)

d₂(2) = 1.128 · Median MR variant: σ̂ = MR̃ / d₄(w)

σWithin Method 4 — Sbar (Average of Subgroup Standard Deviations)

Used on X̄-s charts. More efficient than Rbar for large subgroup sizes (n > 10). Applies c₄ weighting per subgroup.

Sbar Formula (unequal subgroup sizes)
σ̂Within = Σ[hi·si/c₄(ni)] / Σhi

hi = [c₄(ni)]² / [1 − c₄(ni)²]

When subgroup size is constant: σ̂ = s̄ / c₄(n), s̄ = Σsi/k

Unbiasing Constants — c₄ and d₂ Reference Table

These constants correct the bias in sigma estimates from small samples. c₄ is used with standard deviations; d₂ is used with ranges. Both approach 1 as sample size increases.

n (subgroup size)c₄Used in σ̂ = s/c₄
20.7979Pooled σ, Sbar
30.8862
40.9213
50.9400Most common
60.9515
80.9650
100.9727
250.9896
1.0000Bias negligible
n (subgroup size)d₂Used in σ̂ = R̄/d₂
21.128MR chart (w=2)
31.693
42.059
52.326Most common
62.534
72.704
82.847
92.970
103.078

Which Method Should You Use?

n = 1
Individual data
σ̂ = MR̄/d₂(2)

Use I-MR chart. Default w=2. Average moving range divided by 1.128.

n = 2–9
Small subgroups
σ̂ = sp/c₄(d)

Pooled std dev (default) or Rbar. Use X̄-R chart. Pooled is more efficient.

n ≥ 10
Large subgroups
σ̂ = s̄/c₄(n)

Sbar method. Use X̄-s chart. Range method loses efficiency at n > 9.

💡

Source: All sigma estimation formulas on this page are from the Minitab Technical Support Document — Capability Analysis (Normal) Formulas: Capability Statistics (Default) and the Minitab Assistant White Paper on Capability Analysis. The c₄ and d₂ constants follow Montgomery (2001), Introduction to Statistical Quality Control, Wiley. These are the industry-standard formulas used in all major SPC software.

Applied Statistics

Quantitative Methods & Statistics

Hypothesis testing, confidence intervals, regression, ANOVA, probability distributions, and time-series analysis — the statistical toolkit every quality engineer needs to turn data into defensible decisions.

Data Types, Collection & Descriptive Statistics

Data Classification

CategoryTypeCharacteristicsExamples
Qualitative
Description-based
NominalCategories only — no order, no arithmetic. Central tendency: Mode only.Colour (Red/Blue), Pass/Fail, Product type
OrdinalOrdered categories — differences not meaningful. Central tendency: Mode, Median.Good/Bad/Worst, 1–5 star rating, Likert scale
Quantitative
Number-based
IntervalOrdered + equal intervals — no true zero. All central tendency measures valid.Temperature °C, Calendar year, IQ score
RatioOrdered + equal intervals + true zero. All calculations valid.Length, Mass, Volume, Time, Temperature K
Continuous vs Discrete

Continuous: Can take any value in a range. Measurements — length, height, time, temperature. More sensitive, fewer samples needed, but more expensive to collect.

Discrete: Countable, whole numbers only. Number of defects, number of students, yes/no outcomes.

NOIR Mnemonic

Nominal → Ordinal → Interval → Ratio. Each level adds a property: Order → Equal intervals → True zero. You can always use a higher-level statistic on lower-level data but not vice versa.

Data Collection Plan

ElementContent
Why collect?Goal, objective, business question to answer
Operational DefinitionPrecise definition of what is being measured — avoids ambiguity between collectors
How much / how / where / whenSample size, frequency, location, time windows
Type of dataNOIR scale — determines which statistics and charts are appropriate
Collection methodManual (check sheet) or automatic (sensors, gages)
Past vs future dataHistorical data may have biases; prospective data is preferred
ReliabilityIs the measurement system capable? (MSA first)
Data Coding

Transforming data to simplify calculations:

  • Add/Subtract: Mean shifts by the same amount. Standard deviation unchanged.
  • Multiply/Divide: Both mean and SD scale by the same factor.
  • Truncation: Remove repetitive prefix (e.g. 0.55x → subtract 550 and divide by 1000). Reverse transform to get original mean and SD.
Data Quality
  • Imputation: Replacing missing data with substituted values (e.g. row mean). Missing data introduces bias.
  • Benford's Law: In natural data sets, digit 1 appears as leading digit ~30% of the time; digit 9 <5%. Violations can indicate data fabrication or errors.
  • Integrity risks: Bias, lack of knowledge, boredom, rounding, intentional falsification
Worked Example · One dataset · All major measures
Descriptive Statistics — From Raw Data to Meaning

Descriptive statistics are not just mean, median, and mode. A complete descriptive summary explains center, spread, position, frequency, and shape. The goal is to answer five questions: Where is the data centered? How much does it vary? Where do observations sit within the distribution? How often do values occur? And does the shape suggest skewness, heavy tails, or outliers?

Example context: Below is one raw dataset of 30 process measurements. We use the same numbers to explain central tendency, dispersion, position, frequency, and shape — exactly the way descriptive statistics are reported in tools like Excel and Minitab.

44.8, 45.1, 45.3, 45.9, 46.0, 46.2, 46.4, 46.8, 47.1, 47.4, 47.8, 48.0, 48.2, 48.6, 48.9, 49.1, 49.5, 49.9, 50.4, 50.8, 51.3, 51.9, 52.4, 52.8, 53.4, 54.1, 55.0, 56.4, 58.7, 63.2
Mean
50.05
Median
49.00
Std Dev
4.31
Skewness
1.19
Excess Kurtosis
1.27
One Graph — Full Descriptive Statistics Story
This single figure combines frequency, cumulative position, central tendency, quartiles, and tail behavior so users can visually connect the numbers to the shape of the distribution.
Frequency Cumulative % Mean vs Median Quartiles Percentiles Shape
Integrated descriptive statistics graph
The histogram shows frequency. The rising line shows cumulative position. Vertical markers show mean, median, quartiles, and the 90th percentile. The long right tail and the large high-end observation make the distribution slightly right-skewed.
Center
Mean 50.22 vs Median 49.70
Mean slightly above median suggests the right tail is pulling the average upward.
Spread
SD 4.40 · IQR 4.88
Standard deviation shows total variation; IQR focuses on the stable middle of the data.
Shape
Skew 1.23 · Kurtosis 1.54
Positive skew and elevated kurtosis indicate right-tail risk and a few unusually high values.

How to Read Descriptive Statistics

1) Central Tendency — Where is the data centered?

Central tendency describes the “typical” value. The mean uses all observations and shifts toward extreme values. The median is the middle observation and is more stable when data is skewed. The mode is the most frequent value; for continuous measurements it is often estimated by grouping or rounding. In this dataset, the mean (50.05) is slightly above the median (49.00), which hints at a right tail pulling the average upward.

2) Dispersion — How spread out is the data?

Dispersion measures consistency. The range is the full width from minimum to maximum (18.4). The variance (18.59) uses squared deviation, while the standard deviation (4.31) expresses spread in the original units. The IQR (5.40) focuses on the middle 50% of the data and is less sensitive to outliers. A process can have a good mean but still be poor if dispersion is too large.

3) Position — Where do values sit inside the distribution?

Position measures rank. Quartiles divide the data into four parts: Q1 = 46.88, median = Q2 = 49.00, Q3 = 52.27. Percentiles give the value below which a chosen percentage falls. Here the 10th percentile is 45.84 and the 90th percentile is 55.14. These are extremely useful for reporting tails, customer risk, and threshold-based performance.

4) Frequency — How often do values occur?

Frequency tells you how observations are distributed across intervals. The histogram is the main visual tool: tall bars mean many observations in that region, short bars mean few. In descriptive output this idea also appears as counts, relative frequency, and cumulative frequency. Frequency is what turns raw numbers into an interpretable distribution.

5) Shape — Is the distribution symmetric, skewed, or heavy-tailed?

Shape goes beyond average and spread. Skewness (1.19) measures asymmetry: positive skew means a longer right tail, negative skew means a longer left tail, and zero means near-symmetry. Kurtosis looks at tail heaviness and outlier-proneness. The excess kurtosis here is 1.27: values above zero indicate heavier tails than normal, values below zero indicate lighter tails. Shape matters because non-normal shape changes how you interpret means, control limits, and capability.

4) Shape — Skewness & Kurtosis

Skewness measures asymmetry: how far the distribution leans. A value of 0 means perfect symmetry. Positive values indicate a long right tail (mean > median), negative values a long left tail (mean < median). In quality engineering, right skewness often signals occasional high-value outliers — tool wear, burst events, occasional defects. Kurtosis measures tail weight. Excess kurtosis = 0 means the tails match a normal distribution. Positive excess kurtosis (leptokurtic) means more extreme values occur than expected — critical for capability analysis because DPMO estimates derived from Cp/Cpk assume normality. In this dataset, skewness = 1.19 and excess kurtosis = 1.27 — both moderate, indicating a slightly heavier right tail and more occasional high outliers than a pure normal would predict.

Skewness — Three Distribution Shapes Compared

Skewness tells you which direction the data has a longer tail, and where the mean sits relative to the median and mode. Rule of thumb: |skewness| < 0.5 = approximately symmetric; 0.5–1.0 = moderate skew; >1.0 = strong skew.

📊 Skewness — negative, zero, and positive compared
Platykurtic Excess kurtosis < 0 broad flat peak thin tail thin tail Flat top · thin tails e.g. uniform distribution Kurtosis < 3 (excess < 0) Mesokurtic — Normal Excess kurtosis = 0 · Reference Moderate peak · normal tails e.g. measurement error Kurtosis = 3 (excess = 0) Leptokurtic Excess kurtosis > 0 sharp tall peak fat tail fat tail Sharp peak · heavy tails e.g. financial returns · rare events Kurtosis > 3 (excess > 0) normal ref.
💡

Quality engineering rule: Always check both skewness and excess kurtosis before computing Cp/Cpk. If |skewness| > 1 or |excess kurtosis| > 2, consider non-normal capability analysis (Weibull, Johnson transformation, or percentile-based methods) instead of assuming normality.

Descriptive Statistics — Central Tendency

MeasureDefinitionFormula / MethodProperties
Mean (x̄)Arithmetic averagex̄ = Σx / nAffected by extreme values (outliers). Used for ratio/interval data.
ModeMost frequently occurring valueCount occurrences; highest count winsOnly average valid for nominal data. A dataset can have multiple modes (bimodal).
MedianMiddle value when sorted ascendingOdd n: middle value. Even n: average of two middle values.Not affected by outliers. Preferred for skewed distributions.
PercentileValue below which P% of data fallsi = P·n/100. If i whole: avg(i, i+1). If not: round up to next.Q1=25th, Q2=50th (Median), Q3=75th percentile

Descriptive Statistics — Variability

Range
R = Max − Min

Simplest measure of spread. Sensitive to outliers. Example: (6, 9, 10, 11, 11, 14) → R = 14−6 = 8

Interquartile Range (IQR)
IQR = Q3 − Q1

Range of middle 50% of data. Robust to outliers. Example: Q3=11, Q1=9 → IQR = 2. Used in box-and-whisker plots.

Standard Deviation
s² = Σ(xᵢ−x̄)² / (n−1)
s = √s²

Average squared deviation from mean (sample formula uses n−1 for unbiasedness). Example: data (98, 99, 100, 101, 102, 100) → s²=2, s=1.414

Graphical Methods for Depicting Data

Each chart below is rendered from real sample data. Understanding the shape, landmarks, and interpretation of each is essential for the engineering practice.

Chart 1

Histogram

What it shows: Frequency distribution — the shape, centre, and spread of continuous data. Values are grouped into bins; bar height = count in that bin. Bars touch (no gaps) because data is continuous.

Key features: Shape reveals distribution type — normal (bell), right-skewed, left-skewed, bimodal, or uniform. Overlay a normal curve to visually pre-check normality before running a Q-Q plot.

Engineering Use

First look at any dataset. Identify modality, skew, and outliers before any statistical test. Required in DMAIC Measure phase.

0 5 10 15 20 10 15 20 25 30 35 40 45 50 Measurement value Frequency Frequency Normal curve
Chart 2

Box-and-Whisker Plot

What it shows: The five-number summary — Min, Q1, Median, Q3, Max — in a single compact visual. The box spans Q1 to Q3 (the Interquartile Range, IQR). The line inside the box is the Median. Whiskers extend to Min and Max within 1.5×IQR. Points beyond whiskers are outliers.

Key formula: IQR = Q3 − Q1. Outlier threshold = Q3 + 1.5×IQR (upper) or Q1 − 1.5×IQR (lower).

Engineering Use

Compare multiple distributions side by side. Instantly reveals skew, spread, and outliers. Use in MSA to compare operator variation.

10 20 30 40 50 60 Process A Process B Process C Q3 Median Q1 Min Max Outlier
Chart 3

Stem-and-Leaf Plot

What it shows: The full distribution of data while keeping every original value visible. Each data point is split: the stem = leading digit(s), the leaf = the last digit. Reading the leaves left-to-right on each row gives you a mini histogram rotated 90°.

Example data: 21, 24, 26, 28, 31, 33, 35, 37, 39, 41, 43, 46, 48, 52, 55, 58

Engineering Use

Best for small datasets (n < 50). Reveals shape, outliers, and gaps — and unlike a histogram, you can read back every original data value.

Stem-and-Leaf Plot (n = 16) Stem Leaves 2 1 4 6 8 3 1 3 5 7 9 4 1 3 6 8 5 2 5 8 Stem = tens digit | Leaf = units digit | e.g. 2 | 4 = 24
Chart 4

Normal Probability Plot (Q-Q Plot)

What it shows: Whether your data follows a normal distribution. Data quantiles are plotted against theoretical normal quantiles. If the data is normal, all points fall on or very close to the diagonal reference line.

Interpretation: Points hugging the line ✓ normal. S-curve = skewed. Banana curve = heavy tails. A single point far off-line = outlier. Use p-value > 0.05 (Anderson-Darling or Kolmogorov-Smirnov) to confirm at 95% confidence.

Engineering Use

Required before running capability analysis (Cp/Cpk). Non-normal data must be transformed or analyzed with non-parametric methods.

Theoretical quantiles (standard normal) Sample quantiles −2 −1 0 +1 +2 AD p-value = 0.312 p > 0.05 → Normal ✓ Data points Reference line
When to use which chart
Histogram
Shape & distribution of large datasets. First step in any analysis.
Box-and-Whisker
Compare multiple groups. Spot outliers and skew at a glance.
Stem-and-Leaf
Small datasets (n < 50). See every original value in context.
Q-Q Plot
Test normality before Cp/Cpk. Always run before capability studies.

Probability — Models, Rules & Distributions

Probability Models

Classic (A Priori) Model
P(A) = Outcomes in A / Total outcomes

Used when all outcomes are equally likely and can be counted theoretically. Example: P(rolling a 3) = 1/6. No experiment needed.

Relative Frequency (Empirical) Model
P(A) = Times A occurred / Total trials

Used when theoretical probability is unknown — estimate from observed data. Approaches true probability as n → ∞. Example: defect rate from production history.

Counting — Factorial, Permutations & Combinations

ConceptFormulaOrder matters?Example
Factorialn! = n×(n−1)×…×1. 0!=15! = 5×4×3×2×1 = 120
PermutationP(n,r) = n!/(n−r)!Yes — order mattersLock code 3376 — P(10,4) = 5040 arrangements
CombinationC(n,r) = n!/[r!(n−r)!]No — order irrelevantSelect 2 from 5 students — C(5,2) = 10 groups

Key Probability Distributions — Summary Table

DistributionTypeKey parametersConditions / when to useMeanVariance
NormalContinuousμ, σSymmetric, bell-shaped. Central Limit Theorem. 68/95/99.7 rule. Z = (X−μ)/σμσ²
t (Student's)Continuousdf = n−1Small samples (n<30) or unknown σ. Wider than normal; converges to normal as df→∞0df/(df−2)
Chi-square (χ²)Continuousdf = n−1Testing population variance; goodness of fit; independence in contingency tables. χ² = (n−1)s²/σ²df2·df
FContinuousdf₁, df₂Comparing two variances; ANOVA F-ratio = MS_between/MS_within. Always right-tailed.df₂/(df₂−2)
BinomialDiscreten, pFixed n trials; 2 outcomes; constant p; independent. P(x) = C(n,x)·pˣ·(1−p)ⁿ⁻ˣnpnp(1−p)
BernoulliDiscretepBinomial with n=1 (single trial). P(success) = p, P(failure) = 1−ppp(1−p)
HypergeometricDiscreteN, A, nSampling without replacement from finite population. Use instead of binomial when n > 5% of N. P(x) = C(A,x)·C(N−A,n−x)/C(N,n)nA/N
PoissonDiscreteμRare events in fixed region. Mean = variance = μ. P(x;μ) = e⁻μ·μˣ/x!μμ

Confidence Intervals — Complete Reference

A confidence interval provides a range within which the true population parameter is believed to lie with a stated probability (confidence level). The width is controlled by sample size, standard deviation, and confidence level.

CI for Mean — z-based (σ known or n ≥ 30)

CI = x̄ ± zα/2 · (σ / √n)
Confidenceαzα/2
90%0.101.645
95%0.051.96
99%0.012.576
Worked Example

100 random residents, x̄ = $42,000, σ = $5,000. Find 95% CI.

CI = 42,000 ± 1.96 × (5,000/√100)
CI = 42,000 ± 1.96 × 500
CI = 42,000 ± 980
CI = $41,020 to $42,980

CI for Mean — t-based (σ unknown and n < 30)

CI = x̄ ± tα/2, n-1 · (s / √n)

Use t-distribution with (n−1) degrees of freedom. As n increases, t → z.

Worked Example

n=25, x̄=$42,000, s=$5,000. Find 95% CI. t0.025,24 = 2.064

CI = 42,000 ± 2.064 × (5,000/√25)
CI = 42,000 ± 2,064
CI = $39,936 to $44,064

CI for Proportion

CI = p̂ ± zα/2 · √(p̂(1−p̂)/n)

Conditions: np ≥ 5 AND n(1−p) ≥ 5 (to approximate binomial with normal)

Worked Example

n=100, 10 defective (p̂=0.10). Find 95% CI.

np=10 ≥ 5 ✓ n(1−p)=90 ≥ 5 ✓
CI = 0.10 ± 1.96×√(0.10×0.90/100)
CI = 0.10 ± 0.06
CI = 0.04 to 0.16 (4% to 16%)

CI for Variance (Chi-square)

(n−1)s² / χ²α/2 ≤ σ² ≤ (n−1)s² / χ²1-α/2

χ² is not symmetric — use two separate chi-square table values for the two tails.

Worked Example

n=25, s²=4. Find 90% CI for σ². χ²0.05,24=36.42, χ²0.95,24=13.848

Lower: (24×4)/36.42 = 2.64
Upper: (24×4)/13.848 = 6.93
90% CI for σ²: 2.64 to 6.93

Hypothesis Testing — 38 Tests, 6 Families

Every hypothesis test follows the same 6-step logic. What changes is the test statistic and its distribution. Master the framework once — apply it to all 38 tests.

The 6 Families — Complete Decision Tree
HYPOTHESIS TEST What type of data & question? ① PARAMETRIC MEANS · z / t / ANOVA ② POST-HOC Tukey · Bonferroni · Scheffé ③ PROPORTIONS z / χ² / Fisher / McNemar ④ VARIANCE F · Levene · Bartlett · B-F ⑤ NON-PARAMETRIC Wilcoxon · M-W · K-W ⑥ CORRELATION Pearson · Regression · Normality 1-Sample z-Test 1-Sample t-Test 2-Sample z-Test Independent t (Pooled) Welch's t-Test Paired t-Test One-Way ANOVA Two-Way ANOVA Repeated Measures ANOVA 9 tests Tukey HSD Bonferroni Scheffé Duncan's Newman-Keuls 5 tests 1-Proportion z 2-Proportion z χ² Goodness of Fit χ² Independence Fisher's Exact McNemar's Cochran's Q 7 tests F-Test (2 variances) Levene's Test Bartlett's Test χ² Variance Test Brown-Forsythe 5 tests Wilcoxon Signed-Rank Mann-Whitney U Kruskal-Wallis Friedman Test Sign Test Spearman ρ Kendall's τ 7 tests Pearson r Regression t-test Overall F (regression) Shapiro-Wilk Kolmogorov-Smirnov Anderson-Darling Durbin-Watson Log-Rank Test 8 tests 38 tests across 6 families — select a family tab below to explore each test with formulas, graphs & worked examples

Universal 6-Step Framework — Every Test Uses This

State H₀
Null: no effect, no diff, status quo
State H₁
Alternative: what you want to prove
Set α
Usually 0.05. Decide before seeing data.
Compute
Calculate test statistic from your data
p-value
P(data this extreme | H₀ true). p<α → reject H₀
Conclude
Engineering meaning, not just reject/fail
Family ① — 9 Tests
Parametric Means Tests

Use when your response is continuous and approximately normally distributed (or n ≥ 30, by the Central Limit Theorem). You are comparing one or more means. If normality is badly violated with small n, switch to Family ⑤ non-parametric alternatives.

1 · One-Sample z-Test
σ known · n ≥ 30
When to Use This Test
✓ You have one sample and want to test whether its mean equals a known target μ₀
✓ Population std dev σ is known from engineering specs or prior studies
n ≥ 30 — CLT ensures sampling distribution is approximately normal even if data isn't
✗ σ unknown and n < 30 — use the one-sample t-test instead
The Formula
z = (x̄ − μ₀) / (σ / √n)
SymbolMeaningIn Practice
Sample meanAverage of your n measurements
μ₀Hypothesised population meanThe target or specification value you are testing against
σPopulation standard deviationKnown from process history, engineering spec, or prior studies
nSample sizeNumber of observations in your sample
σ/√nStandard error of the meanHow much x̄ varies from sample to sample — shrinks as n grows
Decision rule:  Two-tail: reject H₀ if |z| > zα/2  ·  Upper-tail: reject if z > zα  ·  Lower-tail: reject if z < −zα
z0.025 = 1.960 (two-tail 95%)    z0.05 = 1.645 (one-tail 95%)
Engineering Example — CNC Bolt Diameter
Scenario: A CNC machine produces bolts with a specified diameter of μ₀ = 10.000 mm. From historical process data, σ = 0.050 mm is known. A quality engineer samples n = 64 bolts and measures x̄ = 10.012 mm. Has the machine drifted off-centre? Use α = 0.05, two-tail.
Step-by-Step Solution
① State Hypotheses
H₀: μ = 10.000 mm
H₁: μ ≠ 10.000 mm (two-tail)
② Calculate Standard Error
SE = σ/√n = 0.050/√64 = 0.050/8
SE = 0.00625 mm
③ Compute Test Statistic
z = (10.012 − 10.000) / 0.00625
z = 0.012 / 0.00625 = 1.92
④ Find Critical Value
zcrit = ±1.960 (α=0.05, two-tail)
p-value ≈ 0.055
⑤ Decision
|1.92| < 1.960 → Fail to reject H₀
p = 0.055 > α = 0.05
Rejection Region Diagram
−1.96 +1.96 z=1.92 Reject α/2 Fail to Reject H₀ Reject α/2 z=1.92 just inside +1.96 boundary — p=0.055, borderline
Engineering Conclusion: No statistical evidence the machine has drifted at the 5% level. However p = 0.055 is borderline — increase sample size to n = 100 to detect a 0.012 mm shift more reliably, or investigate if the drift direction is consistently positive.
2 · One-Sample t-Test
σ unknown · any n
When to Use This Test
✓ One sample — comparing mean to a known target μ₀
✓ σ is unknown — you estimate it from your sample as s
✓ Works for any sample size — even n = 5 or n = 10
📌 The most common single-sample test in practice — default when σ is unknown
The Formula
t = (x̄ − μ₀) / (s / √n)
degrees of freedom  df = n − 1
SymbolMeaningIn Practice
Sample meanAverage of your measurements
μ₀Hypothesised meanThe target or spec value you are testing against
sSample standard deviationEstimated from data: s = √[Σ(xᵢ−x̄)²/(n−1)]
s/√nStandard error of the meanUncertainty in x̄ due to finite sample size
dfDegrees of freedomn−1. Determines which t-distribution to use. t → z as df → ∞
Key difference from z-test: The t-distribution has heavier tails than the normal distribution, making it harder to reject H₀ with small samples — correctly accounting for the extra uncertainty from estimating σ with s. As n increases, t → z.
Engineering Example — Fill Weight Verification
Scenario: A packaging line targets μ₀ = 500 g fill weight. An engineer collects a sample of n = 9 packs and measures: x̄ = 497 g, s = 6 g. Is the machine under-filling? Use α = 0.05, lower one-tail (we only care if it's too low).
① Hypotheses
H₀: μ ≥ 500g    H₁: μ < 500g
② Standard Error
SE = s/√n = 6/√9 = 6/3 = 2.0 g
③ Test Statistic
t = (497 − 500) / 2.0 = −3/2 = −1.50
④ Critical Value (df = 8)
tcrit = −1.860 (lower-tail, α=0.05, df=8)
⑤ Decision
−1.50 > −1.860 → Fail to reject H₀
p ≈ 0.086 > 0.05
Conclusion: No statistical evidence of under-filling at 5% level. p = 0.086 is noteworthy though — with n = 9 this test has low power. Increase to n = 25 to detect a 3g shift reliably.
t vs z — Why Tails Are Heavier
−1.860 t=−1.50 Reject Fail to reject (lower-tail, df=8) z reference t (df=8) heavier tails Heavier tails = wider critical region = harder to reject H₀ with small n
3 · Two-Sample z-Test
2 groups · σ₁ σ₂ known · n₁ n₂ ≥ 30
When to Use This Test
✓ Comparing means of two independent groups
✓ Both σ₁ and σ₂ are known, OR both n₁ ≥ 30 and n₂ ≥ 30
✓ Samples are drawn independently from two populations
✗ σ unknown or small n — use independent t-test (Tests 4 or 5)
The Formula
z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂)
SymbolMeaning
x̄₁ − x̄₂Observed difference between the two sample means
σ₁², σ₂²Known population variances for groups 1 and 2
√(σ₁²/n₁+σ₂²/n₂)Standard error of the difference — how much x̄₁−x̄₂ varies by chance
Engineering Example — Comparing Two Production Plants
Scenario: Plant A and Plant B both produce aluminium castings. Historical σ values are known from long-running process control. Does tensile strength differ between plants? α = 0.05, two-tail.
Plant A: n₁=40, x̄₁=52.1 MPa, σ₁=2.0
Plant B: n₂=35, x̄₂=50.8 MPa, σ₂=2.2
SE = √(4.0/40 + 4.84/35)
SE = √(0.100 + 0.138) = √0.238 = 0.488
z = (52.1−50.8) / 0.488 = 1.3/0.488 = 2.66
zcrit = ±1.960 (two-tail, α=0.05)  ·  2.66 > 1.960 → Reject H₀  ·  p ≈ 0.008
Conclusion: Plants A and B produce significantly different tensile strengths. Plant A averages 1.3 MPa higher — investigate process differences.
4 & 5 · Independent t-Test — Pooled & Welch's
2 groups · σ unknown
How to choose: First run an F-test or Levene's test for equal variances. If variances are equal → Pooled t (Test 4). If variances are unequal, or you are unsure → Welch's t (Test 5). When in doubt, Welch's is the safer default — it is slightly conservative but never wrong.
Test 4 — Pooled t (Equal Variances)
t = (x̄₁ − x̄₂) / [Sp × √(1/n₁ + 1/n₂)]
Sp² = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁+n₂−2)
df = n₁ + n₂ − 2
SymbolMeaning
SpPooled standard deviation — weighted average of s₁ and s₂
Sp²Pooled variance — borrows strength from both samples
dfn₁+n₂−2 — more df means narrower t-distribution, easier to reject
Test 5 — Welch's t (Unequal Variances)
t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)
df = (s₁²/n₁ + s₂²/n₂)² / [(s₁²/n₁)²/(n₁−1) + (s₂²/n₂)²/(n₂−1)]
SymbolMeaning
s₁², s₂²Individual sample variances — not pooled
df (Welch)Welch-Satterthwaite equation — df is non-integer, usually lower than pooled df
Engineering Example — Two Welding Processes
Scenario: Two MIG welding processes are compared for joint strength (kN). F-test confirms equal variances. Are the mean strengths different? α = 0.05, two-tail.
Data
Process A: n=10, x̄=52.3, s=2.1
Process B: n=12, x̄=50.1, s=2.3
Pooled Variance
Sp² = (9×4.41 + 11×5.29) / 20
= (39.69 + 58.19) / 20 = 4.894
Sp = √4.894 = 2.212
Test Statistic
SE = 2.212 × √(1/10 + 1/12) = 2.212 × 0.4282 = 0.947
t = (52.3 − 50.1) / 0.947 = 2.32
df = 10+12−2 = 20
Decision
tcrit(20df, α=0.05) = ±2.086
2.32 > 2.086 → Reject H₀
Conclusion: Process A produces significantly stronger joints than Process B (mean difference = 2.2 kN, p ≈ 0.031). The pooled t-test was appropriate because the F-test confirmed equal variances (F = 1.20 < Fcrit = 3.07).

If variances were unequal: Switch to Welch's t. Welch's df would be approximately 19.7 (non-integer) — slightly fewer df, slightly wider critical region, but still p < 0.05 in this case.
6 · Paired t-Test
Same subjects · Before / After
When to Use This Test
✓ The same units / subjects are measured twice — before and after a treatment
Matched pairs — two sensors measuring same part, left vs right side, twin studies
More powerful than 2-sample t — eliminates between-subject variability by focusing only on within-subject change
✗ NOT for independent groups — pairing where none exists inflates Type I error
The Formula — Reduce to a One-Sample t on the Differences
Step 1: compute differences   dᵢ = Y₁ᵢ − Y₂ᵢ
t = d̄ / (s_d / √n)
df = n − 1    H₀: μ_d = 0
SymbolMeaningHow to Calculate
dᵢIndividual differencesdᵢ = Y₁ᵢ − Y₂ᵢ for each pair i
Mean of the differencesd̄ = Σdᵢ / n
s_dStandard deviation of differencess_d = √[Σ(dᵢ−d̄)² / (n−1)]
s_d/√nStandard error of the mean differenceHow precisely d̄ estimates μ_d
Engineering Example — Vibration Damper Before / After
Scenario: A new vibration damper is fitted to 8 identical machine tools. Vibration amplitude (mm/s) is measured before and after on the same machine. Did the damper reduce vibration? α = 0.05, lower one-tail (we want to show it decreased).
MachineBefore (Y₁)After (Y₂)d = Y₁−Y₂(d − d̄)²
18.46.9+1.50.0196
27.15.8+1.30.1156
39.27.4+1.80.0676
46.55.2+1.30.1156
58.87.1+1.70.0196
67.66.0+1.60.0036
79.07.5+1.50.0196
88.16.3+1.80.0676
Sum / Meand̄ = 1.5625Σ = 0.4088
Standard Deviation of Differences
s_d = √(0.4088 / 7) = √0.05840 = 0.2417
Test Statistic
t = 1.5625 / (0.2417/√8)
t = 1.5625 / 0.08545 = 18.28
Critical Value (df = 7)
tcrit(7df, lower-tail, α=0.05) = +1.895
(we reject if t > +1.895 since d = before−after)
Decision & Conclusion
18.28 ≫ 1.895 → Reject H₀
Damper significantly reduces vibration
Mean reduction: 1.56 mm/s (18.6%)
Before vs After — All Differences Positive
8.09 6.53 BEFORE AFTER All 8 lines slope downward — consistent, significant reduction
7 · One-Way ANOVA — Comparing 3 or More Means
k ≥ 3 groups · F-statistic
When to Use This Test
✓ Comparing means of 3 or more independent groups simultaneously
✓ One categorical factor (treatment) with k levels
⚡ Running multiple t-tests instead inflates α — k=4 groups requires 6 pairwise tests → family error = 1−0.95⁶ = 26%
📌 A significant F only tells you at least one pair differs — follow with Tukey HSD to find which pairs (Family ②)
The Formula — Decomposing Total Variation
F = MS_Between / MS_Within
MS_Between = SS_B / (k−1)     MS_Within = SS_W / (N−k)
SymbolMeaningFormula
SS_BetweenVariation due to the factor (between groups)Σ nᵢ (x̄ᵢ − x̄)²
SS_WithinVariation within groups (random error)Σ Σ (xᵢⱼ − x̄ᵢ)²
MS_BetweenMean square between — treatment effect estimateSS_B / (k−1)
MS_WithinMean square within — pure noise estimateSS_W / (N−k)
FRatio of treatment signal to noise. If H₀ true, F ≈ 1. Large F → groups differ.MS_B / MS_W
Engineering Example — 3 Adhesive Curing Temperatures
Scenario: An adhesive bond strength (MPa) is measured at 3 curing temperatures. 4 specimens per group. Does curing temperature significantly affect bond strength? α = 0.05.
120°C (A)150°C (B)180°C (C)
12.115.310.8
11.816.111.2
12.514.810.5
12.215.610.9
x̄=12.15x̄=15.45x̄=10.85
Grand Mean
x̄ = (12.15+15.45+10.85)/3 = 12.817
SS_Between
SS_B = 4[(12.15−12.817)² + (15.45−12.817)² + (10.85−12.817)²]
= 4[0.445 + 6.928 + 3.869] = 44.57
SS_Within
SS_W = Σ(within-group deviations²)
= 0.41 + 1.28 + 0.36 = 2.05
ANOVA Table
SourceSSdfMSF
Between44.57222.2897.7
Within2.0590.228
Total46.6211
Decision
Fcrit(2,9,α=0.05) = 4.26
97.7 ≫ 4.26 → Reject H₀
p < 0.001
Conclusion: Curing temperature significantly affects bond strength. 150°C produces the highest mean (15.45 MPa). Now run Tukey HSD (Family ②) to confirm all three pairs are significantly different.
8 · Two-Way ANOVA
Two factors tested simultaneously
F_A = MS_A / MS_Error
F_B = MS_B / MS_Error
F_AB = MS_AB / MS_Error
  • ✓ 2 categorical factors (A and B)
  • ✓ Tests main effects A, B, and interaction AB
  • ✓ More efficient than two separate one-way ANOVAs
  • ⚡ Detects synergy/interference between factors
Example: Machine type (A: 3 models) × Operator shift (B: Day/Night) → Tensile strength. Two-Way ANOVA reveals whether a specific machine performs better on a specific shift — an AB interaction.
9 · Repeated Measures ANOVA
Same subjects, 3+ time points
SS_Within = SS_Treatment + SS_Error
F = MS_Treatment / MS_Error
Check sphericity with Mauchly's test
  • ✓ Same subjects measured at k ≥ 3 time points
  • ✓ Removes between-subject variation → more power
  • ✓ Assumes sphericity (equal variance of differences)
  • 📌 Non-parametric alternative: Friedman Test (Family ⑤)
Example: 10 operators measured at 4 time points during a shift. RM-ANOVA tests whether fatigue causes a significant and consistent change in accuracy over time across all operators.
Family ② — 5 Tests
Post-Hoc Tests

Run ONLY after a significant ANOVA F-test. ANOVA tells you at least one pair differs — post-hoc tests identify which pairs. Running them without a significant F first inflates Type I error and produces false positives.

Why You Cannot Just Run Multiple t-Tests — The α Inflation Problem
3 Groups C(3,2) = 3 comparisons α_actual = 14.3% 5 Groups C(5,2) = 10 comparisons α_actual = 40.1% 10 Groups C(10,2) = 45 comparisons α_actual = 90%! Formula: α_family = 1 − (1−α)^m where m = number of comparisons · Each at α=0.05
1 · Tukey's HSD — Default Choice for All Pairwise Comparisons
balanced design · equal n
When to Use
✓ After a significant one-way ANOVA F-test
✓ All pairwise comparisons needed simultaneously
✓ Equal group sizes (balanced design)
📌 Best power-to-α-control balance — the default post-hoc choice in most engineering settings
The Formula
HSD = qα,k,df_W × √(MS_W / n)
Reject H₀ for pair (i,j) if |x̄ᵢ − x̄ⱼ| > HSD
SymbolMeaningDetail
qα,k,df_WStudentised range critical valueFrom q-table: depends on α, k (number of groups), df_W (within-group df)
MS_WMean square within (from ANOVA table)Pooled error estimate — same value used in the ANOVA F-test
nGroup size (equal across groups)Number of observations per treatment group
HSDHonestly Significant DifferenceThe minimum difference required between two means to declare significance
Engineering Example — 3 Curing Temperatures for Adhesive
Scenario: One-Way ANOVA found F = 97.7, p < 0.001 (significant). Three curing temperatures: A=120°C (x̄=12.15 MPa), B=150°C (x̄=15.45 MPa), C=180°C (x̄=10.85 MPa). k=3 groups, n=4 per group, MS_W=0.228, df_W=9. Which pairs are significantly different?
① Find q critical value
q(α=0.05, k=3, df_W=9) = 3.948
(from Studentised range table)
② Calculate HSD
HSD = 3.948 × √(0.228/4)
= 3.948 × 0.2387 = 0.942 MPa
③ Compare all pairs vs HSD
Pair|Diff|Sig?
B vs C4.60✓ YES
B vs A3.30✓ YES
A vs C1.30✓ YES
Conclusion: All three temperatures produce significantly different bond strengths. Optimal: 150°C (B) at 15.45 MPa.
Mean Comparison Plot — HSD Intervals
C=10.85 A=12.15 B=15.45 * sig * sig 10 12 14 16 Bond Strength (MPa) — error bars = ±HSD/2
2 · Bonferroni Correction
α* = α/m
✓ Pre-planned comparisons (you decided before data)
✓ Small m (≤5 comparisons) — more power than Tukey for few tests
✗ Large m (≥10) — becomes too conservative, misses real differences
📌 Universal — works for any type of test, not just ANOVA
α* = α / m
Use α* as the significance level for each individual test
SymbolMeaning
αDesired family-wise error rate (usually 0.05)
mTotal number of comparisons being made
α*Adjusted threshold — use this for each individual t-test
Example — Same 3-Process ANOVA
m = 3 pairs, α = 0.05
α* = 0.05/3 = 0.0167
tcrit(df=9, α*=0.0167) ≈ 2.933

Pair B vs A: t = (15.45−12.15)/√(2×0.228/4)
= 3.30/0.338 = 9.76 > 2.933 → Significant
3 · Scheffé Test
complex contrasts
✓ Any linear contrast, not just pairwise comparisons
✓ Unplanned comparisons after viewing data
✗ Lowest power for simple pairs — Tukey is better for pairwise
📌 Most conservative — safest for data-dredging protection
F* = (k−1) × Fα, k-1, N-k
Critical value for any contrast — more demanding than Tukey
SymbolMeaning
kNumber of groups in the ANOVA
Fα,k-1,N-kCritical F from the original ANOVA test
F*Scheffé critical value for any contrast
Example — Complex Contrast: B vs Average of A & C
F* = (3−1) × F0.05,2,9 = 2 × 4.26 = 8.52

Contrast L = x̄_B − (x̄_A+x̄_C)/2
= 15.45 − (12.15+10.85)/2 = 3.90 MPa

F_contrast = L²/(MS_W × Σcᵢ²/nᵢ)
= 3.90²/(0.228 × 1.5) = 44.5
44.5 > 8.52 → Significant
4 · Newman-Keuls (SNK) Step-Down Test
step-down · higher power
When to Use This Test
✓ After significant ANOVA — all pairwise comparisons needed
✓ Higher power than Tukey when k is large (4+ groups)
✗ Family-wise α not fully controlled — some Type I inflation possible
✗ Not recommended for confirmatory regulatory submissions — use Tukey
The Formula
q_p = (x̄_max − x̄_min) / SE
SE = √(MS_W / n)    p = span (number of means in range, from 2 to k)    df = N−k
SymbolMeaningDetail
q_pStudentised range statistic for a span of p meansCritical value CHANGES with p — wider spans use larger q_p
pNumber of means in the range being comparedp=2: adjacent pair · p=k: full range. Wider span = larger critical value.
SEStandard error of a group meanSE = √(MS_W/n) — same as in Tukey
Step-downProcedure orderCompare largest span first (p=k). If not significant, stop. Proceed inward only if larger span is significant.
Engineering Example
Scenario: Four coating processes (k=4): A=10.8, B=12.1, C=15.4, D=11.3 MPa. n=4 per group, MS_W=0.228, df_W=12. Run SNK after significant ANOVA.
① Rank means lowest → highest
A=10.8 < D=11.3 < B=12.1 < C=15.4
② SE and step-down q values
SE = √(0.228/4) = 0.2387
q(p=4,df=12,0.05)=4.199 → HSD=1.002
q(p=3,df=12,0.05)=3.773 → HSD=0.900
q(p=2,df=12,0.05)=3.082 → HSD=0.736
③ Start widest span (p=4): A vs C
|15.4−10.8| = 4.60 > 1.002 → Sig ✓
Advantage over Tukey
Adjacent pairs use smaller q (3.082 vs 4.199) — more power to detect close means. Trade-off: slight α inflation for distant pairs.
Key insight — why step-down works: The procedure uses progressively smaller critical values as spans narrow. For adjacent means (p=2), q=3.082 gives a much tighter threshold than Tukey's fixed q=4.199. This is why SNK detects differences that Tukey misses — but at the cost of slightly elevated Type I error for pairs that span many means.

When to use vs Tukey: Exploratory manufacturing studies where maximising detection matters more than strict family-wise α control. For process improvement decisions (not regulatory submissions).
5 · Duncan's Multiple Range Test
liberal · exploratory only
When to Use This Test
✓ Highest statistical power — detects the smallest real differences between means
✓ Exploratory research where missing a real effect is the greater concern
✗ Weakest family-wise α control — highest false positive rate of all 5 post-hoc tests
✗ Not appropriate for confirmatory engineering or regulatory studies — use Tukey
The Formula
α_p = 1 − (1−α)^(p−1)
Protection level varies with range p. Smallest ranges use nominal α; larger ranges allow higher error.
SymbolMeaningDetail
pNumber of means in the comparison rangep=2: both means adjacent in ranking · p=k: entire range
α_pEffective significance level for span of p meansα_p increases with p — least conservative for wide spans
R_pCritical range for span pR_p = q_p(α_p, df) × SE — smaller than Tukey at each step
Engineering Example
Scenario: Same 4-process data. Duncan's uses α_p = 1−(1−0.05)^(p−1) at each step, giving critical ranges that are tighter than both Tukey and SNK. Shows how liberal the test is.
For p=2: α_2=1−0.95^1=0.050 → q_2=3.082
For p=3: α_3=1−0.95^2=0.098 → q_3=2.779
For p=4: α_4=1−0.95^3=0.143 → q_4=2.663

Duncan R_p (SE=0.2387):
R_2=0.736 R_3=0.663 R_4=0.635

Compare to Tukey HSD=1.002 for all spans.
Duncan flags more pairs as significant.
Critical warning: Duncan's test is the most liberal post-hoc test available. It does NOT control family-wise error rate in the traditional sense. With k=10 groups, the effective α for distant comparisons can exceed 40%. Use only in exploratory biological or agricultural research where power is paramount.
Testα controlPower
TukeyExact ✓High
BonferroniConservative ✓Moderate
Newman-KeulsPartial ⚠Higher
DuncanWeakest ✗Highest
Family ③ — 7 Tests
Proportions & Counts Tests

Use when your data is categorical — pass/fail, defect type, yes/no, attribute data. You are counting frequencies or testing proportions, not measuring a continuous response. The test statistic follows a z or χ² distribution.

1 · One-Proportion z-Test
binary outcome · np₀≥5
When to Use
✓ Binary outcome (defective/good, pass/fail, yes/no)
✓ Testing if a proportion equals a known standard p₀
✓ Both n×p₀ ≥ 5 AND n×(1−p₀) ≥ 5 (sample large enough)
✗ Small n where np₀ < 5 — use Fisher's Exact Test instead
The Formula
z = (p̂ − p₀) / √(p₀(1−p₀)/n)
SymbolMeaningDetail
Sample proportionp̂ = x/n where x = number of successes in sample of n
p₀Hypothesised proportionThe known or target proportion under H₀ (e.g. historical defect rate)
√(p₀(1−p₀)/n)Standard error of p̂How much the sample proportion varies by chance around p₀
zStandardised test statisticCompared to zα=1.645 (one-tail) or zα/2=1.960 (two-tail)
Engineering Example — New Supplier Defect Rate
Scenario: A component supplier has a historical defect rate of p₀ = 0.04 (4%). A new production batch of n = 250 parts is received. 14 defects are found. Has the defect rate increased significantly? α = 0.05, upper one-tail test (we only care if it's higher).
① Hypotheses
H₀: p ≤ 0.04    H₁: p > 0.04
② Sample Proportion
p̂ = 14/250 = 0.056
③ Standard Error
SE = √(0.04×0.96/250)
= √(0.0001536) = 0.01239
④ Test Statistic
z = (0.056−0.04)/0.01239
z = 0.016/0.01239 = 1.29
⑤ Decision
zcrit = 1.645 (upper, α=0.05)
1.29 < 1.645 → Fail to reject H₀
p-value = 0.098
Conclusion: No statistical evidence the defect rate has increased (p=0.098). But borderline — monitor next batch. Increase n to 500 to detect a 4%→5.6% shift reliably.
Upper-Tail Rejection Region
z=1.645 z=1.29 Fail to Reject H₀ Reject z=1.29 does not cross z_crit=1.645 — p=0.098
2 · Two-Proportion z-Test
2 independent groups · pooled p
When to Use
✓ Comparing defect rates, pass rates, or proportions from two independent processes/lines/suppliers
✓ All four counts: n₁p̂₁, n₁(1−p̂₁), n₂p̂₂, n₂(1−p̂₂) all ≥ 5
✗ Paired samples (same parts inspected twice) — use McNemar's instead
📌 Use pooled proportion p̄ under H₀: p₁=p₂ — assumes equality under null
The Formula
z = (p̂₁ − p̂₂) / √[ p̄(1−p̄)(1/n₁ + 1/n₂) ]
p̄ = (x₁ + x₂) / (n₁ + n₂)   ← pooled proportion
SymbolMeaningDetail
p̂₁, p̂₂Sample proportionsDefect rates, pass rates, etc. from each group
Pooled proportionCombined proportion assuming H₀: p₁=p₂ is true. Best estimate of the common p.
√[p̄(1−p̄)(1/n₁+1/n₂)]Pooled standard errorUncertainty in the difference p̂₁−p̂₂ under H₀
Engineering Example — Comparing Two Assembly Lines
Scenario: Line 1: n₁=200, 18 defects → p̂₁=0.090. Line 2: n₂=180, 9 defects → p̂₂=0.050. Are the defect rates significantly different? α=0.05, two-tail.
① Pooled Proportion
p̄ = (18+9)/(200+180) = 27/380 = 0.0711
② Pooled SE
SE = √(0.0711×0.9289×(1/200+1/180))
= √(0.06603×0.01056) = 0.02641
③ Test Statistic
z = (0.090−0.050)/0.02641 = 1.515
④ Decision
zcrit=±1.960 (two-tail)  p≈0.130
1.515<1.960 → Fail to reject H₀
Conclusion: No significant difference at α=0.05. The 4% gap (9%−5%) is not statistically significant with these sample sizes. Need n≈700 per line to detect reliably.
Proportion Comparison Visual
p₀=4% 9.0% Line 1 5.0% Line 2 0% 4% 9% 4% gap not significant at this n — need larger sample
3 · χ² Goodness of Fit
df = k−1
✓ 1 categorical variable, k categories
✓ Does observed distribution match expected?
✓ All expected counts Eᵢ ≥ 5
📌 Tests uniformity, historical match, or theoretical fit
χ² = Σ (Oᵢ − Eᵢ)² / Eᵢ
df = k − 1    Eᵢ = n × pᵢ (expected under H₀)
SymbolMeaning
OᵢObserved count in category i
EᵢExpected count under H₀: Eᵢ = n × p₀ᵢ
(O−E)²/ESquared standardised deviation — large when observed departs from expected
Example — Defect Distribution by Day of Week
Does defect frequency depend on weekday? n=250 defects across 5 days. Expected: 50/day (uniform). Observed: Mon=62, Tue=48, Wed=44, Thu=51, Fri=45.
χ²=(62−50)²/50+(48−50)²/50+(44−50)²/50+(51−50)²/50+(45−50)²/50
=2.880+0.080+0.720+0.020+0.500 = 4.20
χ²crit(df=4, α=0.05) = 9.488
4.20 < 9.488 → Fail to reject H₀
No evidence defects depend on weekday
4 · χ² Test of Independence
df = (r−1)(c−1)
✓ Two categorical variables in a contingency table
✓ Are the two variables independent of each other?
✓ All expected cell frequencies ≥ 5
📌 Eᵢⱼ = (Row_i Total × Col_j Total) / Grand Total
χ² = Σᵢⱼ (Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ
Eᵢⱼ = (Row_i total × Col_j total) / n
Example — Defect Type vs Production Shift
ShiftScratchDentCrackTotal
Day1812535
Night2281545
Total40202080
EDay,Crack=35×20/80=8.75  ENight,Crack=11.25
χ²=(5−8.75)²/8.75+(15−11.25)²/11.25+…
χ² = 6.095  df=(2−1)(3−1)=2
χ²crit(2df, α=0.05) = 5.991
6.095 > 5.991 → Reject H₀
Defect type IS associated with shift
5.991 χ²=6.10 Fail to Reject Reject H₀
3 · Bartlett's Test — Most Powerful When Normal
2+ groups · confirmed normal · χ² statistic
When to Use This Test
✓ 2 or more groups with confirmed normal distributions
✓ Maximum power when normality holds — more sensitive than Levene's for truly normal data
✗ Non-normal data — Bartlett's breaks down badly; use Levene's or Brown-Forsythe
✗ Outliers present — highly sensitive; one outlier can create a false positive
The Formula
χ² = [(N−k) ln(Sp²) − Σ(nᵢ−1) ln(sᵢ²)] / c
c = 1 + [Σ(1/(nᵢ−1)) − 1/(N−k)] / [3(k−1)]    Sp² = Σ(nᵢ−1)sᵢ² / (N−k)    df = k−1
SymbolMeaningDetail
Sp²Pooled within-group varianceWeighted average of all group variances — the common variance under H₀
cBartlett correction factorAdjusts for unequal group sizes. c=1 for equal n, slightly >1 for unequal n.
ln(sᵢ²)Natural log of each group varianceBartlett uses log-variance to create a chi-square statistic
χ²Test statisticLarge χ² → group variances spread widely from Sp² → reject equal variances
Engineering Example
Scenario: Five production batches of a polymer are tested for tensile strength. Shapiro-Wilk confirms normality in all batches. Test if batch variances are equal before ANOVA. n=8 per batch. α=0.05.
① Compute individual variances
s₁²=1.82, s₂²=2.14, s₃²=1.91, s₄²=2.05, s₅²=1.88
② Pooled variance Sp²
Sp² = Σ(7×sᵢ²)/35 = 7(1.82+2.14+1.91+2.05+1.88)/35
= 7×9.80/35 = 1.960
③ Test Statistic
c = 1+[5×(1/7)−1/35]/(3×4) = 1.065
χ²=[35×ln(1.96)−7Σln(sᵢ²)]/1.065
= [35×0.673−7×3.286]/1.065 = 1.42
④ Decision
χ²crit(4df, α=0.05) = 9.488
1.42 < 9.488 → Fail to reject H₀
Batch variances are equal — pooled ANOVA valid
Why log-variance? The natural log of a chi-square distributed variable is approximately normal. Taking ln(sᵢ²) linearises the relationship and allows construction of a chi-square test statistic through the difference between pooled and individual log-variances.

The correction factor c: Without c, the statistic is biased for small samples. The correction brings it closer to the theoretical chi-square distribution. For equal group sizes, c simplifies considerably.

Critical warning: If any batch had a non-normal distribution (say, contaminated data creating a bimodal shape), Bartlett's would flag this as unequal variances even if the underlying processes had identical spread. Always confirm normality first.
Family ④ — 5 Tests
Variance Tests

Use when you need to test spread, not location. Required before independent t-tests (equal variance assumption), before ANOVA (homogeneity of variance), when comparing measurement system precision, or when a spec limit exists on process variability.

Which Variance Test? — Decision Guide
1 sample vs spec
→ χ² Variance Test
Tests if σ² = σ₀² (known target). Uses chi-square distribution. Sensitive to normality.
2 groups, normal data
→ F-Test
Simple, exact, widely understood. Fails with non-normality. Run Shapiro-Wilk first.
2+ groups, any distribution
→ Levene's Test
Robust. Default pre-ANOVA check. Use Brown-Forsythe variant for heavily skewed data.
1 · F-Test for Two Variances
2 normal groups · larger s² on top
When to Use
✓ Two independent samples from normal distributions
✓ Testing if σ₁² = σ₂² (prerequisite before pooled t-test)
✗ Non-normal data — use Levene's Test instead
📌 Always put the larger s² in numerator → right-tail test only
The Formula
F = s₁² / s₂²   (s₁² ≥ s₂²)
df₁ = n₁ − 1 (numerator)    df₂ = n₂ − 1 (denominator)
SymbolMeaningDetail
s₁²Larger sample variance (numerator)Always put the larger variance on top to ensure F ≥ 1
s₂²Smaller sample variance (denominator)Corresponding to group with smaller variance
df₁, df₂Degrees of freedom for each groupdf = n − 1 for each group. Determines which F-distribution to use.
FRatio of variancesUnder H₀ (equal variances), F ≈ 1. Large F → variances differ significantly.
Engineering Example — Two Moulding Machines
Scenario: Two injection moulding machines produce the same part. Machine A: n=10 parts, s=1.8mm. Machine B: n=8 parts, s=0.9mm. Do the machines have significantly different variation? α=0.05, two-tail (testing either direction).
① Hypotheses
H₀: σ_A²=σ_B²    H₁: σ_A²≠σ_B²
② Put larger variance on top
s_A=1.8mm > s_B=0.9mm
F = 1.8²/0.9² = 3.24/0.81 = 4.00
df₁=9 (Machine A), df₂=7 (Machine B)
③ Critical Value (one-tail α=0.05)
Fcrit(df₁=9, df₂=7, α=0.05) = 3.68
(use α=0.05 one-tail since F always ≥1)
④ Decision
4.00 > 3.68 → Reject H₀
Variances are significantly different
Conclusion: Machine A has significantly more variation (4× variance). Use Welch's t-test (not pooled) for comparing means. Investigate Machine A's process stability.
F(9,7) Distribution — Right-Tail Test
F=3.68 F=4.00 Fail to Reject Reject H₀ F=4.00 crosses F_crit=3.68 — reject equal variances
2 · Levene's Test — Robust Equality of Variances
2+ groups · robust · any distribution
When to Use This Test
✓ 2 or more groups — checking equal variances before ANOVA or independent t-test
✓ Data may not be perfectly normal — Levene's is robust to non-normality
📌 The recommended default pre-ANOVA variance check in most engineering contexts
✗ Heavily skewed data with outliers — use Brown-Forsythe (median-based) instead
The Formula
zᵢⱼ = |Yᵢⱼ − Ȳᵢ|   →   Run One-Way ANOVA on z
Significant F on the z values means variances differ    df₁ = k−1    df₂ = N−k
SymbolMeaningDetail
YᵢⱼObservation j from group iRaw measurement value
ȲᵢMean of group iThe group mean used as centre. Replace with median(Yᵢ) for Brown-Forsythe.
zᵢⱼAbsolute deviation from group meanHow spread out each observation is from its group centre
ANOVA on zThe test mechanismIf variances differ, the zᵢⱼ values differ systematically between groups — ANOVA detects this
Engineering Example
Scenario: Three injection moulding machines produce the same part. Before running ANOVA on mean dimensions, test if variances are equal. n=5 parts per machine. α=0.05.
① Compute group means
Machine A: Ȳ=12.15, B: Ȳ=15.45, C: Ȳ=10.85
② Compute zᵢⱼ = |Yᵢⱼ − Ȳᵢ|
A: [0.05,0.35,0.35,0.05,0.15]
B: [0.15,0.65,0.35,0.15,0.25]
C: [0.05,0.35,0.35,0.05,0.15]
③ Run ANOVA on z values
FLevene = 1.24    Fcrit(2,12) = 3.89
④ Decision
1.24 < 3.89 → Fail to reject H₀
Variances are equal — pooled ANOVA valid ✓
Why absolute deviations? The variance of a group is the mean squared deviation from the group centre. By taking |Yᵢⱼ−Ȳᵢ| as the new response, we convert "do variances differ?" into "do mean absolute deviations differ?" — a standard ANOVA question.

Interpreting the result: Fail to reject H₀ → equal variances → use pooled ANOVA or pooled t-test. Reject H₀ → use Welch's t-test or Welch's ANOVA.

In Minitab: Levene's runs automatically as part of One-Way ANOVA output. Look for "Test for Equal Variances" in the session window.
3 · Bartlett's Test — Most Powerful When Normal
confirmed normal · χ² statistic · df=k−1
When to Use This Test
✓ 2 or more groups with confirmed normal distributions
✓ Most powerful variance equality test when normality genuinely holds
✗ Non-normal data — breaks down badly; one outlier can create a false positive
✗ Unknown distribution — use Levene's or Brown-Forsythe instead
The Formula
χ² = [(N−k) ln(Sp²) − Σ(nᵢ−1) ln(sᵢ²)] / c
c = 1 + [Σ(1/(nᵢ−1)) − 1/(N−k)] / [3(k−1)]    Sp² = Σ(nᵢ−1)sᵢ²/(N−k)    df = k−1
SymbolMeaningDetail
Sp²Pooled within-group varianceWeighted average of all sᵢ² — the common variance under H₀
cBartlett correction factorAdjusts for unequal group sizes. c ≈ 1 for equal n.
ln(sᵢ²)Log of each group varianceTaking logs constructs a chi-square statistic from the variance ratios
χ²Test statisticLarge χ² → group variances spread widely from Sp². df=k−1.
Engineering Example
Scenario: 5 production batches of a polymer, n=8 per batch. Shapiro-Wilk confirms normality in all batches. Test if batch variances are equal before pooled ANOVA. α=0.05.
① Individual variances
s₁²=1.82, s₂²=2.14, s₃²=1.91, s₄²=2.05, s₅²=1.88
② Pooled variance Sp²
Sp² = 7×(1.82+2.14+1.91+2.05+1.88)/35 = 1.960
③ Compute χ²
c = 1.065 (correction factor)
χ² = [35×ln(1.96)−7×Σln(sᵢ²)] / 1.065 = 1.42
④ Decision
χ²crit(4df, α=0.05) = 9.488
1.42 < 9.488 → Fail to reject H₀
Batch variances equal — pooled ANOVA valid ✓
Why log-variance? Taking ln(sᵢ²) linearises the relationship between variance and the chi-square distribution, allowing construction of a valid test statistic through the difference between pooled and individual log-variances.

Critical warning: If even one batch had non-normal data (e.g., contaminated samples creating bimodal shape), Bartlett's would flag it as unequal variances even if the underlying processes were identical. Always run Shapiro-Wilk on each group first.

Rule of thumb:
• Normal, symmetric → Bartlett's (most powerful)
• Unknown/any distribution → Levene's
• Skewed or outliers → Brown-Forsythe
4 · χ² Variance Test — One Sample vs Specification
1 sample · σ² vs target · df=n−1
When to Use This Test
✓ One sample — testing if the process variance meets a specification target σ₀²
✓ Answer: "Does this machine's precision meet the engineering spec?"
📌 Requires normal data — sensitive to non-normality unlike Levene's
✗ Two or more groups — use F-test or Levene's instead
The Formula
χ² = (n−1) × s² / σ₀²
df = n−1    Two-tail: reject if χ² < χ²(df, α/2) or χ² > χ²(df, 1−α/2)
SymbolMeaningDetail
n−1Degrees of freedomNumber of observations minus one
Sample varianceComputed from your n measurements: s²=Σ(xᵢ−x̄)²/(n−1)
σ₀²Target specification varianceThe maximum allowable variance from engineering requirements
χ²Test statisticUnder H₀(σ²=σ₀²) follows chi-square(df=n−1). Right-skewed — upper tail for "variance too large".
Engineering Example
Scenario: A precision lathe must produce shafts with σ ≤ 0.020mm (σ₀²=0.0004mm²). Sample of n=20 shafts gives s=0.023mm (s²=0.000529mm²). Has the variance exceeded the specification? α=0.05, upper one-tail.
① Hypotheses
H₀: σ²≤0.0004   H₁: σ²>0.0004
② Test Statistic
χ²=(20−1)×0.000529/0.0004
=19×1.3225=25.13
③ Critical Value
χ²crit(19df, upper α=0.05)=30.14
④ Decision
25.13<30.14 → Fail to reject H₀
Cannot confirm σ² exceeds spec at α=0.05.
But s=0.023>0.020 — monitor closely.
Chi-square distribution shape: Right-skewed, bounded at zero. For a two-tail test (is variance exactly equal to target?), two critical values are needed: χ²(df, α/2) for the lower and χ²(df, 1−α/2) for the upper.

Two-tail example:
Lower: χ²(19, 0.025)=8.91
Upper: χ²(19, 0.975)=32.85
Reject H₀ if χ²<8.91 or χ²>32.85

Always confirm normality first using Shapiro-Wilk before applying this test.
5 · Brown-Forsythe Test — Outlier-Resistant Variance Equality
skewed data · median-based · robust
When to Use This Test
✓ 2+ groups — testing equal variances when data is skewed or has outliers
✓ Same procedure as Levene's but uses group median instead of mean — more robust
📌 Recommended over Levene's whenever skewness or extreme values are present
✗ Clean symmetric normal data — standard Levene's or Bartlett's is sufficient
The Formula
zᵢⱼ = |Yᵢⱼ − median(Yᵢ)|   →   One-Way ANOVA on z
Identical to Levene's except median replaces mean as the group centre    df₁=k−1    df₂=N−k
SymbolMeaningDetail
median(Yᵢ)Median of group iKey difference from Levene's. Median is resistant to outliers; mean is not.
zᵢⱼAbsolute deviation from group medianWith median as centre, outliers contribute only moderately to the z values
ANOVA on zSame as Levene's step 2Run one-way ANOVA on the zᵢⱼ deviations. Significant F → unequal variances.
Engineering Example
Scenario: Three paint formulations tested for adhesion (MPa). Data is right-skewed. Test variance equality before ANOVA. α=0.05.
① Compute group medians
Formulation A: median=12.3
B: median=15.6   C: median=11.1
② Compute zᵢⱼ = |Yᵢⱼ − medianᵢ|
A: [0.2,0.5,0.4,0.1,0.3]
B: [0.4,1.2,0.6,0.3,0.5]
C: [0.1,0.2,0.3,0.1,0.2]
③ ANOVA on z values
FBF=3.12   Fcrit(2,12)=3.89
④ Decision
3.12<3.89 → Fail to reject H₀
Variances equal despite skewed data.
Pooled ANOVA appropriate ✓
Why median beats mean here: In right-skewed data, high outliers pull the group mean upward. Deviations from that pulled mean appear large, making Levene's test incorrectly flag unequal variances. The median is unaffected by outliers — the bulk of the data drives the result.

Quick decision guide:
• Normal, symmetric → Bartlett's (most powerful)
• Any distribution, no outliers → Levene's
• Skewed or outliers present → Brown-Forsythe

In software: Minitab and JMP both run Brown-Forsythe as part of the "Test for Equal Variances" output alongside Levene's. Use the B-F result when you see strong skewness.
Family ⑤ — 7 Tests
Non-Parametric Tests

Use when normality is badly violated with small n, data is ordinal (ranked), or outliers distort parametric tests. These tests rank the data instead of using raw values — they lose some power when normality holds, but are robust and honest when it doesn't.

Parametric vs Non-Parametric — When to Switch
SituationParametric TestNon-Parametric AlternativeWhat It Tests
1 sample or paired, non-normal1-sample / paired tWilcoxon Signed-RankMedian = target; or median difference = 0
2 independent groups, non-normalIndependent tMann-Whitney USame distribution / median in both groups
3+ independent groups, non-normalOne-Way ANOVAKruskal-Wallis HSame distribution across all groups
Repeated measures, non-normalRM-ANOVAFriedman TestSame distribution across conditions
Direction of effect only1-sample tSign TestP(positive change) = 0.5
Monotonic relationshipPearson rSpearman ρ / Kendall τRank correlation (not just linear)
1 · Wilcoxon Signed-Rank Test
1 sample or paired · ranks |dᵢ|
When to Use
✓ One sample or paired data — non-normal distribution
✓ Ordinal scale — you can rank data but not assume normal errors
✓ More powerful than Sign Test — uses both sign AND magnitude of differences
✗ Two independent groups — use Mann-Whitney U instead
The Algorithm
W⁺ = Σ ranks of positive differences
W⁻ = Σ ranks of negative differences
T = min(W⁺, W⁻)
Reject H₀ if T ≤ W_critical (from Wilcoxon table)
StepWhat to doDetail
Compute differencesdᵢ = Yᵢ − μ₀ (1-sample) or dᵢ = Y₁ᵢ − Y₂ᵢ (paired)
Remove zero differencesDrop any dᵢ = 0. Reduce n accordingly.
Rank absolute valuesRank |dᵢ| from 1 (smallest) to n (largest). Average ranks for ties.
Attach original signsW⁺ = sum of ranks where dᵢ > 0; W⁻ = sum of ranks where dᵢ < 0
Test statistic TT = min(W⁺, W⁻). Reject H₀ if T ≤ T_critical from table.
Engineering Example — Hardness Specification Check
Scenario: Target hardness = 50 HRC. 8 hardened steel parts measured. Distribution is unknown/skewed — Shapiro-Wilk suggests non-normality. Test if median = 50 HRC. α=0.05, two-tail.
PartYd=Y−50|d|RankSigned Rank
153.2+3.23.25.5+5.5
247.8−2.22.23−3
355.1+5.15.17+7
449.1−0.90.91−1
551.8+1.81.82+2
656.4+6.46.48+8
752.4+2.42.44+4
846.8−3.23.25.5−5.5
*Tie: parts 1&8 both |d|=3.2 → avg rank (5+6)/2=5.5
W⁺ = 5.5+7+2+8+4 = 26.5
W⁻ = 3+1+5.5 = 9.5
T = min(26.5, 9.5) = 9.5
Decision
T_crit(n=8, α=0.05 two-tail) = 4
9.5 > 4 → Fail to reject H₀
Median consistent with 50 HRC
Signed Ranks — Visual Balance
0 r=5.5 r=7 r=2 r=8 r=4 r=3 r=1 r=5.5 W⁺ = 26.5 W⁻ = 9.5 T=min(26.5,9.5)=9.5 > T_crit=4 → Fail to reject
2 · Mann-Whitney U Test
2 independent groups
✓ Two independent groups, non-normal
✓ Ordinal data or continuous with outliers
📌 Also known as Wilcoxon rank-sum test
✗ Paired data — use Wilcoxon Signed-Rank
U₁ = n₁n₂ + n₁(n₁+1)/2 − W₁
U₂ = n₁n₂ + n₂(n₂+1)/2 − W₂
U = min(U₁, U₂)
W₁ = sum of ranks for group 1 (all obs ranked together)
SymbolMeaning
W₁Sum of ranks for group 1 in combined ranking of all n₁+n₂ observations
UCount of times a group 1 obs precedes a group 2 obs in ranked order. U=0 → perfect separation.
Example — Cycle Times: Old vs New Process
Old: 42,51,48,55,49 (n₁=5)  New: 38,44,41,39,43 (n₂=5)
Combined rank all 10: New dominates lower ranks
W₁(Old)=38, W₂(New)=17
U₁=5×5+5×6/2−38=25+15−38=2
U₂=5×5+5×6/2−17=25+15−17=23
U=min(2,23)=2
Ucrit(5,5,α=0.05)=4  2≤4 → Reject H₀
New process significantly faster
3 · Kruskal-Wallis H Test
3+ groups · χ²(k−1)
H = [12/N(N+1)] × Σ(Rᵢ²/nᵢ) − 3(N+1)
Rᵢ = sum of ranks for group i   df = k−1
Example — 3 Suppliers, Delivery Time
3 suppliers, 5 deliveries each (days): non-normal
Rank all 15 combined → R₁=52, R₂=38, R₃=30
H=[12/(15×16)]×(52²/5+38²/5+30²/5)−3×16
H=[0.05]×(540.8+288.8+180)−48
H=50.48−48=2.48
χ²crit(2df,0.05)=5.991  2.48<5.991
Fail to reject H₀ — suppliers similar
5 · Sign Test — Simplest Non-Parametric Median Test
1 sample · direction only · binomial
When to Use This Test
✓ Minimal data requirements — only the direction of difference (+ or −) can be recorded
✓ Very small n where even Wilcoxon assumptions may not hold
📌 Based on the binomial distribution — no ranking, no magnitudes required
✗ You can measure magnitude of differences — use Wilcoxon Signed-Rank (more powerful)
The Formula
B = count of + signs    B ~ Binomial(n, 0.5)
Under H₀: P(+) = P(−) = 0.5    Discard zero differences    Use binomial table or exact p-value
SymbolMeaningDetail
BCount of positive differencesdᵢ = Yᵢ − μ₀. Count all dᵢ > 0. Ignore dᵢ = 0.
nEffective sample sizeTotal observations minus the number of ties (zeros)
Binomial(n,0.5)The reference distributionUnder H₀ (median=μ₀), each difference is equally likely to be + or −
p-valueExact probabilityP(B ≥ observed) for upper tail, or two-tail: 2 × min(P(≤b), P(≥b))
Engineering Example
Scenario: A new lubricant is tested on 10 machines. Only direction of change in cycle time (faster/slower) is recorded — not the exact change. Does the lubricant reduce cycle time? α = 0.05, lower one-tail.
① Record signs only
Machine: 1 2 3 4 5 6 7 8 9 10
Change: − − + − − 0 − − + −
(0 discarded → n=9)
② Count positives
B = 2 (machines 3 and 9 got slower)
n = 9 (after discarding machine 6)
③ Exact p-value (lower tail)
P(B ≤ 2 | n=9, p=0.5)
= P(0)+P(1)+P(2) = 0.002+0.018+0.070
= 0.090
④ Decision
p = 0.090 > 0.05 → Fail to reject H₀
Insufficient evidence lubricant reduces cycle time
(7 of 9 improved — but not significant at α=0.05)
Why the Sign Test is weak: It throws away all magnitude information. Machine 8 might have improved by 30 seconds and machine 3 might have worsened by 0.1 seconds — the Sign Test treats them identically. This is why Wilcoxon is almost always preferred when you can measure the actual differences.

When the Sign Test is the right choice:
• Only direction was recorded in the data collection
• The comparison is ordinal ("better or worse" with no scale)
• Very small n (n < 6) where even Wilcoxon has almost no power
• Quick screening to confirm direction before a proper study

Power comparison (same data): Sign Test: p=0.090. Wilcoxon Signed-Rank: p≈0.025 (significant). The Sign Test missed a real effect.
6 & 7 · Spearman ρ and Kendall τ — Rank Correlations
monotonic relationship · ordinal · outliers present
When to Use This Test
✓ Relationship between two variables is monotonic but not necessarily linear
✓ Data is ordinal, or continuous but non-normal, or outliers are present
📌 Spearman ρ: non-parametric version of Pearson r — faster to compute, more familiar
📌 Kendall τ: probability interpretation — more meaningful for small n and many ties
The Formula
ρ = 1 − 6Σdᵢ² / [n(n²−1)]
Kendall τ = (C − D) / [n(n−1)/2]    C = concordant pairs    D = discordant pairs
SymbolMeaningDetail
dᵢRank difference for pair idᵢ = rank(Xᵢ) − rank(Yᵢ) — rank each variable separately then subtract
Σdᵢ²Sum of squared rank differencesLarge Σdᵢ² → ranks are misaligned → low correlation
CConcordant pairs (Kendall)Pairs (i,j) where both Xᵢ
DDiscordant pairs (Kendall)Pairs where X rank order disagrees with Y rank order
τ interpretationProbabilityτ = P(concordant) − P(discordant). τ=0.6 means 60% more concordant than discordant pairs.
Engineering Example
Scenario: An engineer ranks 8 circuit boards by visual quality (1=worst, 8=best) and measures their failure time (hours). Is quality rank correlated with failure time? Non-normal distribution. α = 0.05.
① Data and ranks
BoardQuality Rank XFailure hrRank Ydᵢdᵢ²
A14202−11
B2380111
C35804−11
D46205−11
E5490324
F6710600
G7820700
H8910800
Σdᵢ²8
② Compute ρ
ρ = 1 − 6×8 / [8×(64−1)]
= 1 − 48/504 = 1 − 0.095 = 0.905
③ Test significance (n=8)
t = ρ√(n−2)/√(1−ρ²)
= 0.905×√6/√(1−0.819)
= 0.905×2.449/0.425 = 5.21
tcrit(6df, 0.05) = 2.447
④ Decision
5.21 > 2.447 → Reject H₀
Significant rank correlation (ρ=0.905)
Higher visual quality → longer failure time
Spearman vs Pearson: Pearson r measures linear relationship using raw values. Spearman ρ measures monotonic relationship using ranks. For ordinal data or continuous data with outliers, Spearman is more appropriate — one extreme outlier can dominate Pearson but has only rank ±1 effect on Spearman.

Spearman vs Kendall:
• ρ is more familiar and easier to compute
• τ has a clearer probability interpretation (P(concordant) − P(discordant))
• τ is more appropriate when many ties exist
• For n > 30 with few ties: ρ ≈ (3τ/2) approximately

Significance testing: For n > 10, use the t-statistic shown above. For n ≤ 10, use exact Spearman critical value tables.
Family ⑥ — 8 Tests
Correlation, Regression & Normality Tests

Test relationships between variables (correlation, regression), validate model assumptions (normality, independence of residuals), and compare survival or reliability curves. These tests are prerequisites for and extensions of the parametric means tests in Family ①.

1 · Pearson Correlation (r)
linear relationship · both normal
When to Use
✓ Both variables are continuous and approximately normal
✓ Testing if there is a linear relationship between X and Y
✗ r=0 does NOT mean no relationship — only no linear one. Always plot first.
📌 Non-parametric alternative: Spearman ρ (Family ⑤) for ordinal or non-normal data
The Formula
r = Σ(xᵢ−x̄)(yᵢ−ȳ) / √[Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²]
Test H₀: ρ=0 using   t = r√(n−2) / √(1−r²)   df = n−2
SymbolMeaningDetail
rSample correlation coefficient−1 ≤ r ≤ +1. Perfect negative=−1, none=0, perfect positive=+1
Σ(xᵢ−x̄)(yᵢ−ȳ)Sample covariance (unnormalised)Measures how X and Y vary together. Positive = both increase together.
r² (R²)Coefficient of determinationProportion of variance in Y explained by X. r=0.78 → R²=0.61 (61% explained).
t = r√(n−2)/√(1−r²)Test statistic for H₀: ρ=0Follows t-distribution with df=n−2. Use standard t-table.
Engineering Example — Temperature vs Viscosity
Scenario: A polymer process engineer measures melt temperature (°C) and melt viscosity (Pa·s) for n=20 samples. Is there a significant linear correlation? α=0.05.
① Calculate r
r = 0.78 (computed from data)
r² = 0.61 (61% variance explained)
② Test Statistic
t = 0.78×√18 / √(1−0.6084)
= 0.78×4.243 / 0.6258 = 5.29
df = n−2 = 18
③ Critical Value
tcrit(18df, α=0.05, two-tail) = ±2.101
④ Decision
5.29 ≫ 2.101 → Reject H₀
Significant linear correlation
Conclusion: Temperature and viscosity are significantly correlated (r=0.78, p<0.001). 61% of viscosity variation is explained by temperature. Proceed to regression analysis.
t-Distribution Rejection Region (df=18)
−2.101 +2.101 t=5.29→→ Reject Fail to Reject H₀ Reject t=5.29 far in reject region — strong correlation
2 & 3 · Regression t-Test & Overall F-Test
after building regression model
When to Use Each
Regression t-test: Tests if each individual coefficient β ≠ 0. One t-test per predictor. "Does this variable contribute significantly?"
Overall F-test: Tests if the entire model explains any variance. "Is the model as a whole significant?" Run this first.
📌 A model can be overall significant (F) but have individual non-significant t's — multicollinearity or redundant predictors
⚡ Interpret t-tests only after confirming Overall F is significant
Regression t-Test Formula
t = b̂ⱼ / SE(b̂ⱼ)
df = n − p − 1    (p = number of predictors)
Overall F-Test Formula
F = MS_Regression / MS_Residual
df₁ = p    df₂ = n − p − 1
SymbolMeaningDetail
b̂ⱼEstimated regression coefficient for predictor jHow much Y changes per unit increase in Xⱼ, holding others constant
SE(b̂ⱼ)Standard error of the coefficient estimateUncertainty in b̂ⱼ. From regression output (covariance matrix of estimates).
MS_RegressionMean square explained by the modelSS_Regression / p
MS_ResidualMean square unexplained (error)SS_Residual / (n−p−1)
Example: Cycle time (sec) regressed on Temperature (X₁) and Pressure (X₂). n=25 observations. Two predictors (p=2).
Overall F-test:
F = 18.7  df=(2,22)
Fcrit(2,22)=3.44
18.7>3.44 → Model significant ✓
R² = 0.63 (63% explained)
Coefficient t-tests (df=22):
b̂₁=2.34, SE=0.61 → t=3.84 ✓ Sig.
b̂₂=0.12, SE=0.19 → t=0.63 ✗ Not sig.

→ Remove X₂ (Pressure) — not contributing
→ Refit model with X₁ only

Normality Tests (4–6) — Prerequisite for Families ① and ④

4 · Shapiro-Wilk Test — Best Normality Test for Small n
H₀: data is normal · n < 50 · W statistic
When to Use This Test
✓ Testing if a dataset follows a normal distribution — n < 50
✓ Most powerful normality test for small samples — default when n is limited
📌 W close to 1 → consistent with normality. Reject H₀ (W small) → non-normal → use Family ⑤
✗ n > 50 — Anderson-Darling is preferred for larger samples
The Formula
W = (Σ aᵢ x₍ᵢ₎)² / Σ(xᵢ − x̄)²
x₍ᵢ₎ = ordered observations (x-order statistics)    aᵢ = expected normal order statistic coefficients    0 < W ≤ 1
SymbolMeaningDetail
x₍ᵢ₎Order statisticsYour n observations sorted smallest to largest: x₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x₍ₙ₎
aᵢExpected normal order statistic coefficientsTabulated constants. For n=5: a₁=0.6646, a₂=0.2413. Available in Shapiro-Wilk tables.
Numerator (Σaᵢx₍ᵢ₎)²Weighted linear combinationMeasures how well the ordered data matches the expected pattern of a normal distribution
Denominator Σ(xᵢ−x̄)²Total sum of squaresUnnormalised sample variance. Ratio W = 1 iff data is perfectly normal.
Engineering Example
Scenario: A quality engineer collects n=12 bearing diameter measurements before running a t-test. First, confirm the data is approximately normal. α = 0.05.
① Data (n=12) and H₀
H₀: data is from a normal distribution
H₁: data is not normal
Data: 25.1,24.8,25.3,25.0,24.9,25.2,
25.1,25.0,24.7,25.4,25.2,24.9 mm
② Sort and apply aᵢ coefficients
x₍₁₎=24.7, x₍₂₎=24.8, ..., x₍₁₂₎=25.4
a₁=0.5475, a₂=0.3325, ... (from table)
Σaᵢx₍ᵢ₎ = 0.614 (computed)
③ Compute W
W = 0.614² / Σ(xᵢ−x̄)²
= 0.377 / 0.396 = 0.952
④ Decision
Wcrit(n=12, α=0.05) = 0.859
0.952 > 0.859 → Fail to reject H₀
Data consistent with normality ✓
t-test is appropriate
What W actually measures: W is the ratio of the best linear unbiased estimate of σ² (using the order statistics) to the ordinary sample variance. If the data is truly normal, these two estimates should agree closely, giving W ≈ 1. Non-normal data creates a mismatch: W drops below 1.

When to reject: Reject H₀ when W < W_critical. The critical values come from Shapiro-Wilk tables (n from 3 to 50). In software, use the p-value: reject if p < 0.05.

Practical rule of thumb:
• W > 0.95: strong evidence of normality
• 0.90 < W < 0.95: minor non-normality — t-test usually robust
• W < 0.90: significant non-normality — use non-parametric test

In software: Minitab: Stat → Basic Statistics → Normality Test. R: shapiro.test(x).
5 · Kolmogorov-Smirnov Test — CDF Comparison
empirical vs theoretical CDF · or 2-sample
When to Use This Test
✓ One-sample: compare empirical distribution to any fully specified theoretical distribution
✓ Two-sample: compare two unknown distributions to each other — no normality assumed
📌 The two-sample K-S is a general-purpose distribution equality test — works for any shape
✗ Less powerful than Anderson-Darling at the tails — use A-D for reliability/lifetime data
The Formula
D = sup |Fₙ(x) − F₀(x)|
Fₙ(x) = empirical CDF = proportion of observations ≤ x    F₀(x) = theoretical CDF    D = maximum absolute gap
SymbolMeaningDetail
Fₙ(x)Empirical CDFStep function: Fₙ(x) = (number of observations ≤ x) / n. Jumps by 1/n at each data point.
F₀(x)Theoretical CDFThe distribution you are testing against (Normal, Weibull, etc.). Must be fully specified — mean and σ known.
DK-S statisticMaximum vertical distance between Fₙ and F₀ anywhere on the x-axis. Larger D = greater departure.
supSupremumThe maximum value over all x — the worst-case discrepancy between empirical and theoretical CDF
Engineering Example
Scenario: n=30 tensile strength measurements. Test if the data follows a Normal(μ=480, σ=22) distribution. α=0.05.
① Build empirical CDF
Sort n=30 observations.
At each xᵢ: Fₙ(xᵢ) = i/30
E.g., 5th value x₍₅₎=455: Fₙ=5/30=0.167
② Compare to Normal CDF
F₀(455) = Φ((455−480)/22) = Φ(−1.14) = 0.127
|Fₙ(455)−F₀(455)| = |0.167−0.127| = 0.040
③ Find maximum gap D
Compute |Fₙ−F₀| at every data point.
D = max of all these differences = 0.121
④ Decision
Dcrit(n=30, α=0.05) = 0.242
0.121 < 0.242 → Fail to reject H₀
Data consistent with N(480, 22²)
Visual intuition: Plot the step function of your sorted data (empirical CDF) alongside the smooth S-curve of the theoretical CDF. D is the largest vertical gap between the two. If this gap exceeds the critical value, the distributions are significantly different.

Two-sample K-S: Instead of comparing to a theoretical F₀, compare two empirical CDFs: D = sup|Fₙ₁(x) − Fₙ₂(x)|. This tests whether two samples come from the same distribution — no assumptions about what that distribution is. Useful for comparing before/after distributions of a process change.

Important limitation: The parameters (μ=480, σ=22) must be specified independently — not estimated from the same data. If you estimate them from data and then test goodness-of-fit, the K-S critical values are no longer correct. Use Lilliefors correction in that case.
6 · Anderson-Darling Test — Tail-Sensitive Normality
weights tails · reliability data · Minitab default
When to Use This Test
✓ Testing normality when the distribution tails are important (reliability, extreme events)
✓ Medium to large n (50–200+) where A-D is more powerful than Shapiro-Wilk
📌 Can also test Weibull, exponential, lognormal — not just normal distributions
📌 Default normality test in Minitab — what you see in the "Normality Test" output
The Formula
A² = −n − (1/n) Σ(2i−1)[ln F(x₍ᵢ₎) + ln(1−F(x₍ₙ₊₁₋ᵢ₎))]
x₍ᵢ₎ = sorted observations    F = CDF of the distribution under H₀    Smaller A² = better fit
SymbolMeaningDetail
x₍ᵢ₎Sorted observationsx₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x₍ₙ₎ — same order statistics used in Shapiro-Wilk
F(x₍ᵢ₎)CDF at order statistic iFor normality test: F(x) = Φ((x−x̄)/s) — evaluated at each data point
(2i−1)Weight functionGives extra weight to the i=1 and i=n terms — the tails. This is why A-D is more sensitive at tails than K-S.
Anderson-Darling statisticReject H₀ if A² exceeds the critical value for the chosen distribution and α
Engineering Example
Scenario: 25 component lifetime measurements (hours) from an ALT study. Test if lifetimes follow a normal distribution before applying parametric analysis. α = 0.05.
① Sort data and compute Fₙ(x₍ᵢ₎)
Sort n=25 lifetimes, compute x̄ and s.
For each x₍ᵢ₎: F(x₍ᵢ₎) = Φ((x₍ᵢ₎−x̄)/s)
② Apply weighted sum formula
For each i=1..25:
Term_i = (2i−1)[lnF(x₍ᵢ₎)+ln(1−F(x₍₂₆₋ᵢ₎))]
A² = −25 − (1/25)×ΣTerm_i = 0.412
③ Apply correction for estimated parameters
A²* = A²(1 + 4/n − 25/n²)
= 0.412(1+0.16−0.04) = 0.412×1.12 = 0.462
④ Decision
crit(α=0.05, normal) = 0.752
0.462 < 0.752 → Fail to reject H₀
Data consistent with normal distribution
p-value ≈ 0.24
Why tail-weighting matters: The term (2i−1) means the first and last observations (the extremes) receive the most weight in the A² sum. This makes Anderson-Darling particularly sensitive to departures from normality in the tails — which is precisely where reliability data is most likely to deviate (early failures, wear-out tails).

A-D for other distributions: Replace F(x) with the Weibull, lognormal, or exponential CDF. Minitab's "Individual Distribution Identification" runs A-D for 14 distributions simultaneously and shows which fits best. For reliability engineers, the Weibull A-D test is often the first step.

A-D vs Shapiro-Wilk: Both good normality tests. Use S-W for n < 50 (more powerful). Use A-D for n > 50 or when testing non-normal distributions. In practice, run both and look at the probability plots — the visual always supplements the test.

Correlation, Regression & Time Series

Pearson Correlation Coefficient (r)

r valueInterpretation
r = +1Perfect positive linear relationship
0 < r < 1Positive correlation (as X increases, Y increases)
r = 0No linear relationship
−1 < r < 0Negative correlation (as X increases, Y decreases)
r = −1Perfect negative linear relationship
⚠️ Correlation ≠ Causation

A strong correlation between X and Y does not mean X causes Y. Both may be driven by a third variable (confounding). Example: ice cream sales and drowning rates are positively correlated — both caused by hot weather.

Coefficient of Determination r²

r² = proportion of variance in Y explained by X (0 to 1). If r=0.88 → r²=0.77 → 77% of variance in Y is explained by X. Remaining 23% is unexplained.

Fisher's Z Transformation — CI for Correlation

Since r is not normally distributed, a 3-step process is needed to find CI for the population correlation ρ:

  1. Convert r to z' (Fisher's transformation): z' = 0.5·[ln(1+r) − ln(1−r)]
  2. Build CI in z' space: SE = 1/√(N−3), then z'±zα/2·SE
  3. Back-transform CI limits from z' to r
Worked Example: N=10, r=0.88, 95% CI
Step 1: z' = 0.5[ln(1.88)−ln(0.12)] = 1.375
Step 2: SE = 1/√(10−3) = 0.378
CI = 1.375 ± 1.96×0.378
z' range: 0.635 to 2.11
Step 3: Back-transform → r: 0.56 to 0.97

Regression & Time Series — Strongest Upgrade (Real Data + Visual Learning)

NIST-aligned learning system
Use the graph to understand the data structure first. Then model. Then diagnose. Then decide.

This upgrade follows the NIST/SEMATECH engineering-statistics philosophy: graphics are not decoration, and modeling should never be separated from diagnostics. For regression, that means fit + residuals + structure checks. For time series, that means trend + seasonality + dependence before forecasting.

Real-data example: Anscombe Data Set I

NIST uses Anscombe's example to show why graphics are essential. We start with Data Set I, which behaves approximately linearly and is appropriate for a simple linear regression. The model is Y = β₀ + β₁X + ε. Least squares chooses the line that minimizes the sum of squared residuals.

X: 10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5 Y₁: 8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68

What users should learn from this example

  • Slope: change in Y for one-unit change in X.
  • Intercept: fitted Y when X = 0.
  • R²: how much of the Y variation is explained by X.
  • Residuals: the model's errors — the real diagnostic layer.
Equation
ŷ = 3.00 + 0.50x
The slope tells the engineering effect size.
Correlation
r = 0.816
A strong positive association exists, but correlation alone is never enough.
0.667
Explained variation, not proof that the model is correct.
Fit + Residual Diagnostics in One View
This is how to teach regression properly: not just the line, but the line plus its mistakes. NIST emphasizes residual analysis because the line by itself can be misleading.
Anscombe regression diagnostics
How to explain it: the left panel answers “what line fits the data?” The right panel answers “are the residuals random enough that the linear model is reasonable?” A good model shows residuals centered around zero without curvature, fanning, or a strong trend.
When simple linear regression is appropriate

One response, one predictor, approximately linear relationship, no strong time-order dependence, and residual variation that is roughly constant.

What to check before trusting the model

Scatter plot shape, residual plot, unusual points, leverage/influence, and whether the physics actually supports a straight-line relationship.

Real-data example: Anscombe's Quartet

NIST uses Anscombe's quartet to prove a crucial lesson: four data sets can have nearly identical summary statistics and regression results, yet have completely different structures. That means numbers alone can hide the truth.

Why this belongs in your site

  • Users immediately understand why plots matter.
  • It prevents blind trust in slope, r, and R².
  • It visually explains linearity, outliers, curvature, and leverage.
Same Statistics. Different Reality.
All four data sets have nearly the same mean, slope, intercept, and correlation, but the scatter plots tell completely different stories.
Anscombe quartet
Teaching message: Data set I is approximately linear. Data set II is curved. Data set III has an outlier. Data set IV is dominated by one influential point. This is exactly why the NIST handbook treats exploratory graphics as essential, not optional.
What graphs reveal that summary statistics hide

Curvature, clusters, outliers, leverage points, unequal spread, and poor experimental design.

Best practice to teach users

Always look at the scatter plot first, then fit the model, then inspect residuals. Never reverse that order.

Real-data example: NIST monthly CO₂ concentrations

The NIST handbook uses monthly CO₂ concentrations from Mauna Loa as a sample time-series data set. Time-series data must be treated differently from ordinary regression data because the observations are ordered in time and can have trend, seasonality, and autocorrelation.

CO₂ (1974–1977 subset): 333.13, 332.09, 331.10, 329.14, 327.36, 327.29, 328.23, 329.55, 330.62, 331.40, ...

What users should learn from this example

  • Trend: the long-term level is rising.
  • Seasonality: there is a repeating annual pattern.
  • Smoothing: moving averages reveal the underlying path.
  • Modeling rule: identify the structure before forecasting.
Run Sequence View — Trend + Seasonality
This real NIST sample shows why time-series analysis exists: the data are not just a cloud of independent points. The 12-point moving average helps reveal the underlying level.
NIST CO2 time series
How to explain it: the raw line has short-term variation, but the bigger story is a rising long-term level with a repeating annual cycle. If users fit a simple straight line and ignore the seasonal pattern, they will miss important structure.
Seasonal Subseries View — See the Repeating Cycle Directly
NIST highlights seasonal subseries plots as a tool for detecting seasonality when the period is known. For monthly data, the period is usually 12.
NIST CO2 seasonal subseries
Teaching message: this view makes the seasonality obvious. In this CO₂ subset, the series peaks around May and falls through late summer/early autumn. That repeating structure is exactly what seasonal methods are designed to capture.
Time-series workflow users should remember

Plot the series, check for trend, check for seasonality, check for dependence, smooth only to reveal structure, then choose a forecasting method.

When not to use ordinary regression alone

When data are collected over time and adjacent observations are related. Independence is no longer a safe assumption.

Reliability Engineering

Reliability Engineering

Quantitative methods for predicting, measuring, and improving product reliability — from MTBF calculations to Weibull analysis and system configuration modeling.

Core Reliability Metrics

Six numbers tell the complete reliability story of any system. Understanding how they connect — and what levers you pull to improve each — is the foundation of reliability engineering.

How the Metrics Connect — Follow the Chain
Observed Data
Failures, Time
Count & hours
Failure Rate
λ = F / T
failures/hr
MTBF
1 / λ
mean hrs between failures
+
MTTR
Repair÷Failures
mean hrs to repair
Availability
MTBF/(MTBF+MTTR)
fraction time operational
FIT Rate
λ × 10⁹
failures per billion hr
R(t) — Reliability
e−λt
prob. working at time t
📋 Worked Example — Industrial Pump System

A fleet of 10 pumps operated for 50,000 hours total. During this period, 5 failures were recorded with a total repair time of 20 hours.

λ
= 5 ÷ 50,000 = 0.0001 failures/hr
MTBF
= 50,000 ÷ 5 = 10,000 hr
MTTR
= 20 ÷ 5 = 4 hr per repair
A
= 10,000 ÷ (10,000+4) = 99.96%
FIT
= 0.0001 × 10⁹ = 100,000 FIT
R(2,000)
= e−2000/10000 = 81.9%
R(t) — Reliability Decay over Time (MTBF = 10,000 hr)
100% 74% 50% 25% 0 2k 4k 6k 8k hr t=2,000 hr R=81.9% Operating Time → R(t) →

The Four Fundamental Functions — How They Derive from Each Other

Every reliability distribution is built from a single starting point: the probability density function f(t). All other functions follow by integration or differentiation. This is the NIST 8.1.6 framework — not four separate formulas, but one coherent system.

① f(t) — Failure Density (PDF)
f(t) ≥ 0, ∫₀^∞ f(t) dt = 1
Probability of failure in the instant [t, t+dt]
integrate
0 to t
② F(t) — Cumulative Failure (CDF)
F(t) = ∫₀ᵗ f(u) du
F(0) = 0, F(∞) = 1
Fraction of population that has failed by time t
f(t) = h(t)·R(t)
R(t)
= 1−F(t)
1 − F(t)
④ h(t) — Hazard Rate
h(t) = f(t) / R(t)
= −d[ln R(t)]/dt
Instantaneous failure risk, given survival to t. Integrates to H(t).
R(t) =
exp[−H(t)]
③ R(t) — Reliability / Survival
R(t) = 1 − F(t)
= ∫ₜ^∞ f(u) du
= exp[−H(t)]
Fraction of population surviving beyond time t
Master Formula
R(t) = exp[−H(t)] = exp[−∫₀ᵗ h(u) du]

Every reliability distribution is fully specified by its hazard function h(t). The shape of h(t) determines the failure behaviour — decreasing, constant, or increasing — which maps directly to the three phases of the bathtub curve.

Hazard Rate h(t) — Three Shapes, Three Stories

The hazard function h(t) is the most informative reliability curve. Its shape tells you what kind of failure mechanism is at work and what action to take.

Decreasing h(t) — DFR
Early-life / Infant Mortality
Time → h(t) Weibull β < 1
When: Manufacturing defects, poor welds, wrong parts. Failures happen early and then rate drops.
Action: Burn-in testing, incoming inspection, supplier qualification.
Constant h(t) — CFR
Useful Life (Random Failures)
Time → h(t) Exponential / Weibull β = 1
When: Random external events, human error, overstress. Failures don't depend on age (memoryless).
Action: MTBF tracking, redundancy design, maintenance intervals.
Increasing h(t) — IFR
Wear-out / End of Life
Time → h(t) Weibull β > 1
When: Fatigue, corrosion, mechanical wear, degradation with use. Older = more likely to fail.
Action: Preventive maintenance schedules, replacement before B10 life.

Key Distributions — Formula Sets

Two distributions cover the majority of reliability engineering problems. Know their hazard shapes and when to use each.

Exponential Distribution — h(t) = λ (constant)
f(t) = λ·e−λt
F(t) = 1 − e−λt
R(t) = e−λt
h(t) = λ   (memoryless)
MTTF = 1/λ · Use for: electronic components in useful life, random failure events
Weibull Distribution — h(t) = (β/η)(t/η)β−1
R(t) = e−(t/η)β
F(t) = 1 − e−(t/η)β
h(t) = (β/η)(t/η)β−1
MTTF = η·Γ(1+1/β)
β: shape · η: characteristic life · Use for: bearings, fatigue, wear-out — any failure phase

Quick Reference — Model Selection Guide

ModelR(t) Formulah(t) Shapeβ (Weibull)Use WhenTypical Applications
Exponentiale−λtConstant ─β = 1Random, memoryless failuresElectronics, software, random events
Weibull (β<1)e−(t/η)βDecreasing ↘0.5–0.9Infant mortality, manufacturing defectsEarly field failures, weld defects
Weibull (β>1)e−(t/η)βIncreasing ↗2–4 typicalWear-out, fatigue, ageingBearings, tyres, mechanical wear
Lognormal1−Φ[(ln t−µ)/σ]Peaks then dropsFatigue crack propagation, corrosionMetals fatigue, semiconductor oxide
Normal1−Φ[(t−µ)/σ]Increasing ↗Tight wear-out with known lifeLight bulbs, precision wear mechanisms
Gamma1−I(λt, k)Varies with kSystems requiring k failures to failStandby redundancy, shock models
💡

Which model to choose? Plot your data on Weibull probability paper first. If it falls on a straight line, Weibull fits. If the β you estimate is 1.0, use the simpler exponential. Only choose lognormal or normal when engineering knowledge of the failure mechanism supports it.

The Bathtub Curve — Failure Rate Over Product Lifetime

The bathtub curve describes how the failure rate λ(t) changes across a product's life. Three distinct phases require different engineering strategies.

Failure Rate λ(t) vs Time — The Classic Bathtub Curve
Time Failure Rate λ(t) Infant Mortality Useful Life Wear-Out decreasing λ constant λ increasing λ β < 1 β = 1 β > 1 → Burn-in testing → Preventive maintenance → Scheduled replacement ← MTBF calculated from useful-life phase (β = 1) →
Phase 1 — Infant Mortality

Decreasing Failure Rate

High initial failure rate that falls rapidly. Caused by manufacturing defects, design weaknesses, and substandard components.

  • Burn-in / ESS testing
  • Process improvement (SPC)
  • Incoming inspection
Phase 2 — Useful Life

Constant Failure Rate

Low, approximately constant random failure rate. MTBF = 1/λ applies here. Normal operating life of the product.

  • Exponential distribution (β=1)
  • Preventive maintenance
  • Redundancy design
Phase 3 — Wear-Out

Increasing Failure Rate

Failure rate rises as components age, fatigue, or corrode. Planned maintenance replaces components before this phase starts.

  • Predictive maintenance
  • Scheduled replacement (B10)
  • Weibull β > 1

Weibull Analysis — The Universal Reliability Distribution

The Weibull distribution models all three bathtub phases by adjusting a single parameter β. It's the most widely used distribution in reliability engineering.

Reliability Function
R(t) = exp[ −(t/η)β ]
Probability of surviving to time t
Cumulative Failure
F(t) = 1 − exp[ −(t/η)β ]
Fraction failed by time t
Hazard Rate
h(t) = (β/η) × (t/η)β−1
Instantaneous failure rate at time t
Mean Time to Failure
MTTF = η × Γ(1 + 1/β)
Expected life; Γ = gamma function
B10 Life
B10 = η × (−ln 0.90)1/β
Time at which 10% of units have failed
📊 Weibull Hazard Rate h(t) for Different β Values
t → h(t) β < 1 infant mortality β = 1 exponential / random β = 2 early wear-out β = 3.5 normal-like wear-out

Interpreting β

β < 1
Infant mortality. Failure rate decreasing. Manufacturing or design defects. Burn-in recommended.
β = 1
Constant random failures. Useful-life phase. Exponential distribution. MTBF = η.
β = 2
Early wear-out. Linearly increasing hazard. Ball bearings, seals, O-rings.
β ≈ 3.5
Normal-like wear-out. Common for mechanical fatigue, gears, springs. Symmetric failure distribution.
📌

Characteristic Life η: Always the time at which 63.2% of units fail, regardless of β. F(η) = 1 − e⁻¹ = 0.632. On a Weibull probability plot, η is where the fitted line crosses the 63.2% horizontal.

Generalised Bx Life — Beyond B10

B10 is the automotive standard, but any Bx life (the time by which x% of units have failed) can be computed directly from the Weibull parameters. This is the NIST-standard approach (NIST 8.2.2).

General Bx Formula — Time at Which x% Have Failed
B_x = η · [−ln(1 − x/100)]^(1/β)
B1 Life (1% failure)
η · [−ln(0.99)]^(1/β)
= η · (0.01005)^(1/β)
B10 Life (10% failure)
η · [−ln(0.90)]^(1/β)
= η · (0.10536)^(1/β)
B50 Life (50% failure)
η · [−ln(0.50)]^(1/β)
= η · (0.69315)^(1/β)
Worked Example — η = 8,000 hr, β = 2.5
B1 = 8000 · (0.01005)^(1/2.5) = 8000 · 0.1096 = 877 hr
B10 = 8000 · (0.10536)^(1/2.5) = 8000 · 0.2347 = 1,878 hr
B50 = 8000 · (0.69315)^(1/2.5) = 8000 · 0.7917 = 6,334 hr

Weibull Probability Plotting — Rank Regression Method (NIST 8.2.2)

The Weibull probability plot linearises the Weibull CDF so failure data falls on a straight line — slope gives β, x-intercept at 63.2% gives η. The step-by-step NIST procedure:

Step 1 — Rank the Failure Times

Order n failures as t₁ < t₂ < … < tₙ. Assign median rank (Benard's approximation):

F̂ᵢ = (i − 0.3) / (n + 0.4)
More accurate than i/n for small samples. NIST-recommended. Also used in ReliaSoft Weibull++.
Step 2 — Linearise the CDF

Take double log of both sides of R(t) = e^(−(t/η)^β):

ln[ln(1/(1−F))] = β·ln(t) − β·ln(η)
Y = β·X − β·ln(η)
Y = ln[ln(1/(1−F))], X = ln(t). Plot Y vs X — should be linear for Weibull.
Step 3 — Fit & Extract Parameters

Fit straight line to (ln(tᵢ), ln[ln(1/(1−F̂ᵢ))]) by least squares:

β̂ = slope of fitted line
η̂ = exp(−intercept / β̂)
Or: read η directly where the line crosses F = 63.2% on the Weibull paper.
Worked Example — 5 Bearings
Failures at: 850, 1100, 1350, 1600, 2100 hr
n = 5, Benard ranks:
i=1: F̂ = 0.70/5.4 = 0.130
i=2: F̂ = 1.70/5.4 = 0.315
i=3: F̂ = 2.70/5.4 = 0.500
i=4: F̂ = 3.70/5.4 = 0.685
i=5: F̂ = 4.70/5.4 = 0.870
→ Plot, fit line → β̂ ≈ 2.1, η̂ ≈ 1,580 hr

📈 Weibull Quick Ref

  • η (Characteristic Life)

    63.2% of units fail by η. Always true regardless of β.

  • B10 Life

    Time by which 10% of units fail. Standard bearing and automotive spec metric.

  • Weibull Probability Plot

    Plot ln(ln(1/(1−F))) vs ln(t). Slope = β. Intercept gives η. Straight line confirms Weibull fit.

  • Random Number Gen.

    x = σ(−ln ξ)^(1/η) where ξ ~ Uniform(0,1). From the Stockholm Distributions Handbook.

Series vs Parallel Systems

📊 Series (all must work) vs Parallel (at least one must work)
SERIES — R_sys = R₁ × R₂ × R₃ R₁=0.9 R₂=0.9 R₃=0.9 R_sys = 0.729 0.9 × 0.9 × 0.9 = 0.729 → System reliability DECREASES — always lower than worst individual component PARALLEL — R_sys = 1 − (1−R₁)(1−R₂) R₁=0.9 R₂=0.9 R_sys = 0.99 1 − (0.1 × 0.1) = 0.99 → System reliability INCREASES — redundancy dramatically improves reliability
Series System — ALL must work
Rsys = R₁ × R₂ × R₃ × … × Rₙ
Any single failure kills the system. Reliability always lower than weakest component.
Parallel System — ANY one works
Rsys = 1 − (1−R₁)(1−R₂)…(1−Rₙ)
Redundancy. All must fail for system failure. Higher reliability than best component.
💡

Design implication: Critical single-point failures (no redundancy = series) dramatically reduce system reliability. Adding even one parallel backup on a 0.9 R component raises it from 0.9 to 0.99 — a 10× reliability improvement for that subsystem.

Probability Foundations for Reliability — NIST 8.1.6

Reliability is fundamentally a probability — the probability that a device performs its intended function during a specified period under stated conditions. The four-function framework below is the mathematical backbone of all reliability analysis, per NIST Engineering Statistics Handbook Section 8.1.6.

The Four Functions — Complete Derivation Chain

NIST 8.1.6 — Mathematical Relationships Between Reliability Functions
① Probability Density Function f(t)
Definition: f(t) = dF(t)/dt
Requirements: f(t) ≥ 0, ∫₀^∞ f(t)dt = 1
Meaning: instantaneous failure rate density
② Cumulative Distribution Function F(t)
F(t) = P(T ≤ t) = ∫₀ᵗ f(u)du
F(0) = 0, lim F(t) = 1 as t→∞
Meaning: fraction failed by time t
③ Reliability (Survival) Function R(t)
R(t) = 1 − F(t) = P(T > t)
= ∫ₜ^∞ f(u)du
Meaning: probability of surviving beyond t
④ Hazard Function h(t) — the Key Function
h(t) = f(t) / R(t)
= −d[ln R(t)] / dt
Meaning: conditional failure rate at time t
given survival to t
Master Formula — Integrating the Hazard Function
H(t) = ∫₀ᵗ h(u)du [Cumulative Hazard Function]
R(t) = exp[−H(t)] ← universally valid
f(t) = h(t)·R(t) = h(t)·exp[−H(t)]
MTTF = ∫₀^∞ R(t)dt = E[T]

Five Distributions — h(t), R(t), F(t), f(t) Side-by-Side

Distributionh(t) HazardR(t) ReliabilityF(t) CDFMTTFShape
Exponential λ (constant) e^(−λt) 1 − e^(−λt) 1/λ Flat — useful life, β=1
Weibull (β/η)(t/η)^(β−1) exp[−(t/η)^β] 1−exp[−(t/η)^β] η·Γ(1+1/β) Power — all phases
Lognormal φ(z)/[σt·Φ(−z)]
z=(ln t−µ)/σ
1 − Φ[(ln t−µ)/σ] Φ[(ln t−µ)/σ] exp(µ+σ²/2) IFR then DFR — fatigue, corrosion
Normal φ(z)/[1−Φ(z)]
z=(t−µ)/σ
1 − Φ[(t−µ)/σ] Φ[(t−µ)/σ] µ IFR — tight wear-out
Gamma Complex — see NIST 8.1.9 1 − I(t/β, k)
incomplete gamma
I(t/β, k) k<1: DFR, k=1: Exp, k>1: IFR

Hazard Function Shapes — The Physical Meaning

DFR — Decreasing
dh/dt < 0

Failure rate decreases with time. Indicates infant mortality — early failures remove weak units. Example: Weibull β < 1, Gamma k < 1.

CFR — Constant
dh/dt = 0

Constant failure rate. Memoryless — age does not affect remaining life. Exponential distribution. Example: electronic components in useful life.

IFR — Increasing
dh/dt > 0

Failure rate increases — component ages and wears out. Weibull β > 1, Normal, most mechanical components under fatigue and corrosion.

Bathtub (Mixed)
DFR → CFR → IFR

Real-world products combine all three phases. The lognormal has a unimodal hazard — rises then falls. Mixed Weibull populations generate bathtub curves.

Types of Events — Probability Rules

Mutually Exclusive

Cannot occur simultaneously.

P(A ∩ B) = 0
Independent Events

A's occurrence doesn't affect P(B).

P(B|A) = P(B)
Complementary

A' is the event that A does NOT occur.

P(A') = 1 − P(A)
Rule of Addition — Union (A or B)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Mutually exclusive: P(A ∪ B) = P(A) + P(B)
Rule of Multiplication — Intersection (A and B)
P(A ∩ B) = P(A) × P(B|A)
Independent events: P(A ∩ B) = P(A) × P(B)

MTBF Worked Examples

Example 1 — MTTF vs MTBF

100 items tested for 10,000 hours. 5 items failed at 5,000 hours.

MTTF = (95×10,000 + 5×5,000) / 100
= 975,000 / 100 = 9,750 hrs

MTBF = (95×10,000 + 5×5,000) / 5
= 975,000 / 5 = 195,000 hrs
💡

MTTF divides by total units (100); MTBF divides by failed units only (5)

Example 2 — Hazard Function Derivation

Exponential distribution with λ = 0.001 failures/hr. Find h(t), R(t) at t = 500 hr:

f(t) = 0.001·e^(−0.001t)
F(t) = 1 − e^(−0.001t)
R(t) = e^(−0.001t)
h(t) = f(t)/R(t) = λ = 0.001 (constant)

R(500) = e^(−0.5) = 0.6065 = 60.65%
💡

Exponential h(t) = λ always — this is the memoryless property

Fault Tree Analysis — Top-Down Deductive Reliability

Fault Tree Analysis (FTA) is a top-down, deductive technique that models how a defined system failure (the top event) can occur through combinations of component failures and human errors. It uses Boolean logic gates to trace failure pathways. Foundational to MIT 22.38 and MIL-STD-1629A. Complement to FMEA: FTA asks "what combinations of events cause this failure?" while FMEA asks "what does each component failure cause?"

Academic foundation: MIT 22.38 (Prof. Golay) — Section I: Event Sequence Identification & Section XII: Probabilistic Risk Assessment · Rausand & Høyland, System Reliability Theory 2nd Ed. (Wiley, 2003) · MIL-STD-1629A FMECA procedures

The Logic Gates — Boolean Building Blocks

AND
AND Gate
All inputs must fail

Output event occurs only if all input events occur simultaneously. Represents redundancy — protective when components are independent.

P(T) = P(A) × P(B) × P(C)
Valid only when A, B, C are independent
OR
OR Gate
Any input causes failure

Output event occurs if at least one input event occurs. Most common gate — represents that any single failure propagates upward.

P(T) = 1 − (1−P(A))(1−P(B))(1−P(C))
Exact for independent events
BASIC
Basic Event
Leaf node — no further decomposition

The lowest-level failure event in the tree. Has an assigned failure probability λ (from field data, MIL-HDBK-217F, or manufacturer's data).

P(event) = 1 − e−λt ≈ λt for small λt
UNDEVEL
Undeveloped Event
Not further analysed

An event not developed further — either insufficient data, or judged insufficiently important. Marked explicitly so reviewers know it was a conscious decision.

The FTA Process — 6 Steps

1
Define the Top Event precisely
State the exact undesired event: not "pump fails" but "pump fails to deliver ≥10 L/min at system pressure within 5 sec of demand signal." Ambiguity at this step invalidates everything below.
2
Define system boundaries and assumptions
What is in scope? What interfaces are excluded? What is the mission time? What is the operating environment? All stated explicitly.
3
Construct the tree top-down using Boolean gates
Decompose the top event into immediate causes connected by AND/OR gates. Continue decomposing each intermediate event until basic events are reached. Never skip levels.
4
Identify Minimal Cut Sets (MCS)
A cut set is a set of basic events whose simultaneous occurrence causes the top event. A minimal cut set cannot be reduced further — removing any element prevents the top event. These are the system's vulnerabilities.
5
Quantify — assign failure probabilities
Assign P(failure) to each basic event from field data, MIL-HDBK-217F, OREDA, or manufacturer specs. Propagate probabilities upward through gates.
6
Evaluate importance measures — prioritise action
Use Birnbaum importance, Fussell-Vesely importance, and Risk Reduction Worth (RRW) to rank which basic events most affect top-event probability. Focus design improvements on high-importance events.

Minimal Cut Sets — The Mathematics

For a system with minimal cut sets K₁, K₂, …, Kₘ, the top event T occurs if any cut set occurs completely. Using the inclusion-exclusion principle:

Exact (inclusion-exclusion)
P(T) = Σ P(Kᵢ) − Σ P(Kᵢ ∩ Kⱼ)
       + Σ P(Kᵢ ∩ Kⱼ ∩ Kₖ) − …
Rare Event Approximation (λt ≪ 1)
P(T) ≈ Σᵢ P(Kᵢ) = Σᵢ ∏ⱼ∈Kᵢ qⱼ
where qⱼ = probability of basic event j. Valid when P(T) ≪ 1.
Worked Example — Pump Failure System
Top event: Loss of cooling water
Minimal cut sets: K₁ = {A,B}, K₂ = {C}, K₃ = {A,D}
q_A = 0.01  q_B = 0.02
q_C = 0.005  q_D = 0.03

P(K₁) = 0.01 × 0.02 = 2×10⁻⁴
P(K₂) = 0.005
P(K₃) = 0.01 × 0.03 = 3×10⁻⁴

P(T) ≈ 2×10⁻⁴ + 5×10⁻³ + 3×10⁻⁴
      = 5.5×10⁻³
K₂ (single point {C}) dominates — highest priority for design improvement.

Component Importance Measures

MeasureFormulaInterpretationUse when
Birnbaum (Structural)IB(i) = ∂P(T)/∂qᵢRate of change of top event probability with respect to component i's failure probabilityComparing sensitivity — which component improvement gives biggest P(T) reduction?
Fussell-VeselyIFV(i) = P(at least one MCS containing i fails) / P(T)Fraction of total risk contributed by cut sets containing component iMaintenance prioritisation — where does this component contribute most to risk?
Risk Reduction Worth (RRW)RRW(i) = P(T) / P(T | qᵢ=0)Factor by which P(T) decreases if component i is made perfect (qᵢ→0)Investment decisions — what is the maximum achievable benefit of improving component i?
Risk Achievement Worth (RAW)RAW(i) = P(T | qᵢ=1) / P(T)Factor by which P(T) increases if component i is guaranteed to failMaintenance criticality — how important is it to keep this component working?
FTA vs FMEA — Complementary, Not Competing
Fault Tree Analysis
  • ▸ Top-down (deductive)
  • ▸ Starts from a specific failure
  • ▸ Finds all combinations that cause it
  • ▸ Handles complex logic & dependencies
  • ▸ Quantitative probability output
  • ▸ Best for: safety-critical top events
FMEA / FMECA
  • ▸ Bottom-up (inductive)
  • ▸ Starts from each component
  • ▸ Traces all effects of each failure
  • ▸ Covers the full system comprehensively
  • ▸ RPN prioritisation (qualitative)
  • ▸ Best for: comprehensive coverage of all failure modes

Best practice: use FMEA first for broad coverage, then FTA for deep analysis of the highest-severity failure modes identified by FMEA. Together they give both breadth and depth.

Reliability Block Diagrams — System Architecture & Redundancy

A Reliability Block Diagram (RBD) is a success-oriented model that shows how components must function for the system to function. Unlike FTA which models failure, RBD models success paths. Based on MIT 22.38 Section IX (Simple Logical Configurations) and Rausand & Høyland Chapter 4.

Academic foundation: MIT 22.38 Section IX — Complex Systems, Stress-Strength Interference, Markov Models · Rausand & Høyland, System Reliability Theory 2nd Ed. Ch. 4 · IEC 61078:2016 — Reliability block diagram techniques

Series, Parallel, and k-out-of-n Systems

Series System — Chain of Single Points

All components must work

System fails if any single component fails. Reliability is always lower than the weakest component. The engineering challenge: every component is a single point of failure.

Rs = R₁ × R₂ × R₃ × … × Rₙ
For n equal components R each:
Rs = Rⁿ   → rapidly decreasing
Worked Example — 4 pumps in series, R = 0.95 each:
Rs = 0.95⁴ = 0.8145  (18.6% chance of failure)
C₁ C₂ C₃ IN OUT
Active Parallel — Full Redundancy

Any one component is sufficient

System fails only if all parallel components fail. Reliability always exceeds the best single component. Each component runs continuously (hot standby).

Rs = 1 − ∏ᵢ(1 − Rᵢ) = 1 − (1−R)ⁿ
Worked Example — 3 pumps in parallel, R = 0.90 each:
Rs = 1 − (1−0.90)³ = 1 − 0.001 = 0.999
C₁ C₂ C₃

k-out-of-n Systems — Voting Architectures

A k-out-of-n system succeeds if at least k of n components function. This generalises both series (k=n) and parallel (k=1). Common in safety systems: 2-out-of-3 voting gives high reliability without the cost of full parallel redundancy.

General Formula (equal components, R each)
Rk/n = Σⱼ₌ₖⁿ C(n,j) × Rʲ × (1−R)ⁿ⁻ʲ
where C(n,j) = n! / [j!(n−j)!] is the binomial coefficient. This is the binomial reliability sum from k to n.
Worked Example — 2-out-of-3 voting, R = 0.90
P(≥2 work) = C(3,2)×0.9²×0.1¹ + C(3,3)×0.9³×0.1⁰
= 3×0.81×0.1 + 1×0.729
= 0.243 + 0.729 = 0.972
Compare: pure parallel 1-out-of-3 = 0.999 (higher), series 3-out-of-3 = 0.729 (lower). 2-out-of-3 is the safety engineer's sweet spot.

Standby Redundancy — Cold, Warm, and Hot

TypeStandby StateSwitch ReliabilityReliability FormulaApplication
Hot StandbyFully powered, running at full load — instant takeoverNear 1.0 (automatic)Same as active parallel: R = 1−(1−R)ⁿAircraft hydraulics, nuclear safety systems
Warm StandbyPartially energised — reduced failure rate λ_s < λ during standbyHigh, with brief startupRequires Markov model — intermediate between hot/coldGenerator sets, server farms
Cold StandbyDe-energised — zero failure rate during standbyR_sw required (switch may fail)R_s = e−λt(1 + λt) for 1-unit standby with perfect switchBackup pumps, emergency systems
Cold Standby — Derivation (1 active + 1 standby, perfect switch)
P(system works at t) = P(active works) + P(active fails before t, standby takes over and works)
= e−λt + ∫₀ᵗ λe−λτ · e−λ(t−τ)
= e−λt + λt·e−λt
= e−λt(1 + λt)
Significantly higher than active parallel [1−(1−e−λt)²] because the standby unit does not accumulate ageing during standby period.

Stress-Strength Interference — MIT 22.38 Section IX.3

A component fails when applied stress S exceeds its strength R. Both are random variables. Reliability = P(R > S). This is the probabilistic basis for design margins.

General Formula
R = P(Strength > Stress)
R = ∫₋∞^∞ f_S(s) · F_R(s) ds
= ∫₋∞^∞ f_S(s) · P(R > s) ds
Normal-Normal Case (analytical result)
If S ~ N(µ_S, σ_S²) and R ~ N(µ_R, σ_R²):
(R−S) ~ N(µ_R−µ_S, σ_R²+σ_S²)
Reliability = Φ[(µ_R−µ_S) / √(σ_R²+σ_S²)]
= Φ[z_margin]
This z_margin is the "reliability index" β used in structural reliability and ISO 2394.

Accelerated Life Testing — Compressing Time to Failure

ALT subjects products to stresses (temperature, voltage, vibration, humidity) higher than normal use conditions to induce failures faster, then models the stress-life relationship to extrapolate reliability at use conditions. The core challenge: accelerate only the same failure mechanisms that would occur in service.

References: Elsayed, Reliability Engineering (Addison-Wesley, 1996) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · MIL-HDBK-217F Reliability Prediction · IEC 60068 Environmental Testing Standards · University of Maryland ENRE 641 — Accelerated Life Testing course

The Three Primary Life-Stress Models

Model 1 — Arrhenius (Temperature)

Most widely used ALT model

Derived from the Arrhenius equation for chemical reaction rates. Valid when the dominant failure mechanism is thermally activated — oxidation, corrosion, electromigration, diffusion, creep.

Life-Temperature Relationship
L(T) = A · exp(E_a / kT)
L = characteristic life (B50, MTTF, η), A = pre-exponential constant, E_a = activation energy (eV), k = Boltzmann constant = 8.617×10⁻⁵ eV/K, T = temperature in Kelvin
Acceleration Factor
AF = L_use / L_test = exp[E_a/k × (1/T_use − 1/T_test)]
AF tells you how many use-hours one test-hour represents
Worked Example — Semiconductor Oxide Degradation
E_a = 0.7 eV (oxide degradation)
T_use = 55°C = 328 K
T_test = 125°C = 398 K

AF = exp[0.7/8.617×10⁻⁵ × (1/328 − 1/398)]
= exp[8123 × (0.003049 − 0.002513)]
= exp[8123 × 0.000536]
= exp[4.354]
= 77.8×
Interpretation: 1,000 hours at 125°C = 77,800 hours (≈8.9 years) at 55°C use temperature — with E_a = 0.7 eV
Typical E_a values:
0.3–0.5 eV: Electromigration in Al
0.5–0.7 eV: Oxide breakdown
0.7–1.0 eV: Corrosion mechanisms
1.0–1.4 eV: Si-SiO₂ interface traps
Model 2 — Inverse Power Law (Voltage / Stress)

For non-thermal stress: voltage, load, pressure

Used when failure mechanism is driven by mechanical stress, voltage, or other non-thermal accelerants. L(S) follows a power law relationship with the stress level S.

Life-Stress Relationship
L(S) = C / Sⁿ
C = constant, S = stress level (V, MPa, Hz), n = inverse power law exponent (fitted from data)
AF = (S_test / S_use)ⁿ
Typical n: 2–4 for dielectric breakdown, 3–6 for capacitor voltage stress
Worked Example — Capacitor Voltage Stress
Rated voltage: V_use = 50V
Test voltage: V_test = 100V
Power law exponent: n = 4

AF = (100/50)⁴ = 2⁴ = 16×
Interpretation: Testing at 2× rated voltage compresses time by 16× for a power law exponent of 4
Model 3 — Eyring (Temperature + Second Stress)

Extends Arrhenius to include a second stress variable (humidity, voltage, vibration). Derived from quantum mechanics (reaction rate theory). Used in humidity + temperature testing (85°C/85% RH, JEDEC JESD22-A101).

Generalised Eyring Model
L(T,V) = (A/T) · exp(E_a/kT) · exp(−(B + C/T)·V)
T = temperature (K), V = second stress, A, B, C = model parameters fitted from multi-stress test data
Common Multi-Stress ALT Test Conditions
TestStress 1Stress 2Standard
HAST130°C85% RHJESD22-A110
85/8585°C85% RHJESD22-A101
THB85°C85% RH + biasAEC-Q100
HTOL125–150°CFull voltageJESD22-A108

HALT, HASS, and ESS — Qualitative vs Quantitative ALT

HALT

Highly Accelerated Life Test

Apply stepwise increasing stress (temperature, vibration, both combined) to failure. Goal: find the operating limit and destruct limit. Qualitative — not intended for life prediction, but for design robustness discovery.

Used at: design validation phase. Output: design margins, failure modes to address before production.
HASS

Highly Accelerated Stress Screening

Production screen applied to every unit (or sample). Uses stress levels below HALT destruct limits to precipitate latent defects before shipment without consuming life of good units.

Used at: production. Output: defect escape rate reduction, infant mortality elimination.
ESS

Environmental Stress Screening

Temperature cycling and/or random vibration screen applied post-assembly. MIL-HDBK-2164 defines profiles. Addresses infant mortality phase of bathtub curve — forces early failures to occur in factory, not in the field.

Typical profile: −40°C to +70°C, 5 cycles, 3–5 G_rms vibration. Governed by MIL-HDBK-2164A.

Combining Weibull with ALT — Life Data Analysis

In ALT data analysis, Weibull distribution is fitted at each stress level. The assumption is that the shape parameter β is constant across stress levels (same failure mechanism), while the scale parameter η changes with stress according to the life-stress model.

Arrhenius-Weibull Model
η(T) = A · exp(E_a / kT)
R(t,T) = exp[−(t/η(T))^β]
β = constant (same failure mechanism)
Parameters estimated via Maximum Likelihood Estimation (MLE) from pooled data across all stress levels.
ALT Data Analysis Workflow
01 Run tests at ≥3 stress levels above use stress
02 Fit Weibull to each stress level — verify β is consistent
03 Plot ln(η) vs 1/T — confirm linearity (Arrhenius)
04 Estimate E_a from slope of ln(η) vs 1/T line
05 Extrapolate η to use stress using life-stress model
06 Compute R(t) and B10 at use conditions with confidence bounds

Reliability Demonstration Testing — Proving What You Claim

A reliability demonstration test (RDT) answers a specific question: "Can I claim with C% confidence that the true reliability is at least R* at time t?" It requires defining a reliability target, a confidence level, a mission time, and a test termination criterion — before running a single unit.

References: Meeker & Escobar, Statistical Methods for Reliability Data Ch. 10 (Wiley, 1998) · MIL-HDBK-781 — Reliability Testing for Engineering Development, Qualification, and Production · IEC 61124 — Reliability Testing: Compliance Tests for Constant Failure Rate and Constant Failure Intensity

The Mathematics of Demonstration Testing

The fundamental statistical basis: if a sample of n units is tested and c failures are observed, the lower confidence bound on the true failure probability p at a given confidence level C is derived from the binomial distribution (or Poisson for time-terminated tests).

Zero-Failure Test (c = 0) — Success Run
Claim: R* at confidence C
Required sample: n = ln(1−C) / ln(R*)
Or: n = ln(α) / ln(R*)  where α = 1−C
If all n units pass (zero failures), you can claim reliability ≥ R* at confidence C. The most efficient test when you expect very high reliability.
With c Failures Allowed (Binomial basis)
1 − C = Σⱼ₌₀ᶜ C(n,j) · (1−R*)ʲ · (R*)ⁿ⁻ʲ
Solve for n given C, R*, and allowed failures c
Allowing failures increases n required but reduces the risk of falsely rejecting a good product.
Sample Size Required — Zero Failure Test
Reliability R*90% Confidence95% Confidence99% Confidence
0.900222944
0.950455990
0.990230299459
0.999230229954603
0.9999230262995746051
Formula: n = ln(1−C) / ln(R*). Demonstrating very high reliability requires enormous samples — the practical argument for ALT.

Time-Terminated Tests — Poisson Basis

When units are tested for a fixed time T (each), total accumulated test time = n × T. For an exponential (constant failure rate) model, the number of failures follows a Poisson distribution. This allows MTBF/failure rate demonstration.

Lower Confidence Bound on MTBF
MTBF_lower = 2T_total / χ²(α, 2c+2)
T_total = total accumulated test time
c = observed failures
α = 1 − C (risk level)
χ² quantile from chi-squared distribution with 2c+2 degrees of freedom
Zero-Failure Time Test (c=0)
MTBF_lower = 2T_total / χ²(α, 2) = −2T_total / ln(α)
χ²(α,2) = −2 ln(α) for 2 degrees of freedom
Worked Example — MTBF Demonstration
Requirement: MTBF ≥ 5,000 hr
Confidence required: 90% (α = 0.10)
Test plan: 10 units × 1,000 hr each
T_total = 10,000 hr
Result: 0 failures observed

MTBF_lower = −2×10,000 / ln(0.10)
= −20,000 / (−2.303)
= 8,686 hr
✓ Claim: MTBF ≥ 5,000 hr at 90% confidence demonstrated
The lower bound (8,686 hr) exceeds the 5,000 hr requirement

Producer & Consumer Risk — The OC Curve for Reliability

Consumer Risk (β)

Probability that a product with reliability below the requirement passes the test. A false accept.

β = P(accept | R < R*)
Typically ≤ 0.10 for safety-critical systems. Reducing β requires larger n or fewer allowed failures.
Producer Risk (α)

Probability that a product with reliability above the requirement fails the test. A false reject.

α = P(reject | R ≥ R*)
Allowed c > 0 reduces producer risk. The discrimination ratio d = R_acceptable / R_rejectable controls the sharpness of the OC curve.
Discrimination Ratio d

Ratio between the MTBF that should be accepted (θ₁) and the MTBF that should be rejected (θ₀).

d = θ₁ / θ₀ ≥ 1
Larger d → easier to discriminate → smaller test required. MIL-HDBK-781 defines standard test plans for d = 1.5, 2.0, 3.0.
Sources for this module: MIT OCW 22.38 (Prof. M. Golay) — Probability and Its Applications to Reliability, Quality Control, and Risk Assessment · Rausand & Høyland, System Reliability Theory 2nd Ed. (Wiley, 2003) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · Elsayed, Reliability Engineering (Addison-Wesley, 1996) · MIL-HDBK-217F, MIL-HDBK-781, MIL-STD-1629A · IEC 60300, IEC 61078, IEC 61124 · University of Maryland ENRE 641

Distribution Functions — Complete Reliability Toolkit

NIST 8.1.7–8.1.9 covers the full family of distributions used in reliability engineering. Each distribution is defined by its hazard function shape — choosing the right one is not a statistical preference but a physical claim about the failure mechanism.

NIST reference: Engineering Statistics Handbook Sections 8.1.7 (Exponential), 8.1.8 (Weibull), 8.1.9 (Lognormal, Normal, Gamma) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · Nelson, Accelerated Testing (Wiley, 1990)

Exponential Distribution — NIST 8.1.7

The exponential is the only continuous distribution with the memoryless property: P(T > t+s | T > t) = P(T > s). A component that has survived to time t has the same remaining life distribution as a new component. This applies only during the useful-life phase (constant failure rate).

Complete Formula Set — Exponential(λ)
f(t) = λ·e^(−λt), t ≥ 0
F(t) = 1 − e^(−λt)
R(t) = e^(−λt)
h(t) = λ (constant)
H(t) = λt
MTTF = 1/λ
Var(T) = 1/λ²
Median = ln(2)/λ = 0.693/λ
Worked Example — Electronic Component Reliability
Component failure rate: λ = 2×10⁻⁵ failures/hr
MTBF = 1/λ = 50,000 hr

R(t=8760 hr) = e^(−2×10⁻⁵ × 8760)
= e^(−0.1752) = 83.9% (1-year reliability)

R(t=40000) = e^(−0.8) = 44.9%
P(fail before 40,000 hr) = 55.1%
⚠️

Common misconception: MTBF = 50,000 hr does NOT mean the component lasts 50,000 hr. It means ~63.2% fail BEFORE 50,000 hr. At t = MTBF, R(MTBF) = e⁻¹ = 36.8% survive.

Lognormal Distribution — NIST 8.1.9

If ln(T) ~ Normal(µ, σ²), then T ~ Lognormal(µ, σ). Best for failure mechanisms driven by multiplicative damage accumulation: fatigue, corrosion, crack propagation. The hazard function is unimodal — rises then decreases (IFR then DFR), making it physically realistic for degradation processes.

Complete Formula Set — Lognormal(µ, σ)
f(t) = φ[(ln t−µ)/σ] / (σt)
F(t) = Φ[(ln t−µ)/σ]
R(t) = 1 − Φ[(ln t−µ)/σ]
h(t) = f(t)/R(t) [no closed form]
MTTF = exp(µ + σ²/2)
Median = e^µ
Var(T) = e^(2µ+σ²)·(e^(σ²)−1)
φ = standard normal PDF, Φ = standard normal CDF
Worked Example — Fatigue Life of Steel Shaft

µ = 10.5, σ = 0.8 (in ln-hours). Find R at 30,000 hr and MTTF.

z = (ln(30000) − 10.5) / 0.8
= (10.309 − 10.5) / 0.8 = −0.239
R(30000) = 1 − Φ(−0.239) = 59.4%

MTTF = exp(10.5 + 0.8²/2) = exp(10.82) = 49,916 hr
Applications: Fatigue, corrosion, stress-corrosion cracking, electromigration, semiconductor oxide breakdown, biological failure times

Normal Distribution in Reliability

The Normal(µ, σ) is appropriate when failure times have a symmetric distribution — tight wear-out mechanisms where fatigue accumulates uniformly. The hazard function is strictly increasing (IFR), making it suitable for components that reliably wear out at a predictable age.

Complete Formula Set — Normal(µ, σ)
f(t) = φ[(t−µ)/σ] / σ
F(t) = Φ[(t−µ)/σ]
R(t) = 1 − Φ[(t−µ)/σ]
h(t) = φ(z) / [σ(1−Φ(z))] strictly IFR
MTTF = µ
B10 = µ − 1.282σ (10th percentile)
Worked Example — Brake Pad Wear-Out

µ = 60,000 km, σ = 8,000 km. Find B10 and R at 45,000 km.

B10 = 60,000 − 1.282×8,000 = 49,744 km
R(45,000) = 1 − Φ[(45000−60000)/8000]
= 1 − Φ(−1.875) = 97.0%
Applications: Mechanical wear-out (pistons, gears, brake pads), light bulb filament life, highly-controlled manufacturing processes

Gamma Distribution — NIST 8.1.9

Gamma(k, β) is the distribution of the sum of k independent exponential(1/β) random variables. Shape parameter k controls hazard function shape: k < 1 gives DFR, k = 1 gives exponential, k > 1 gives IFR.

Complete Formula Set — Gamma(k, β)
f(t) = t^(k−1)·e^(−t/β) / [β^k·Γ(k)]
F(t) = I(t/β, k) [incomplete gamma ratio]
R(t) = 1 − I(t/β, k)
MTTF = kβ
Var(T) = kβ²
Mode = (k−1)β for k ≥ 1
Distribution Selection Guide — NIST
Failure MechanismBest Distribution
Constant random failuresExponential
Infant mortality / any phaseWeibull
Fatigue, corrosion, crack growthLognormal
Symmetric, tight wear-outNormal
Sum of k failure eventsGamma
Unknown — fit all, use AIC/BICProbability plot comparison

Parameter Estimation — MLE, Rank Regression & Censored Data

Fitting a reliability distribution to field or test data is a statistical inference problem. Two main methods: Maximum Likelihood Estimation (MLE) — the NIST-preferred method for accuracy and confidence interval generation — and Rank Regression — graphical, intuitive, and useful for small samples. Both must handle censored data correctly.

NIST reference: Engineering Statistics Handbook Sections 8.2.1 (Kaplan-Meier), 8.2.2 (Probability Plotting), 8.2.4 (Confidence Intervals), 8.2.6 (MLE) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) Chapters 3–5

Censoring — The Core Challenge of Reliability Data

Complete (Exact) Failure

The exact failure time tᵢ is known. Contributes f(tᵢ) to the likelihood. The ideal case — often impractical in life testing.

L contribution: f(tᵢ)
Right Censored (Suspended)

Unit survived to time cᵢ (end of test or withdrawal). We know T > cᵢ but not exact failure time. Most common type.

L contribution: R(cᵢ) = 1−F(cᵢ)
Left Censored

Unit already failed before first inspection at time dᵢ. We know T < dᵢ. Common in inspection data.

L contribution: F(dᵢ)
Interval Censored

Failure in interval [Lᵢ, Rᵢ] — inspected OK at Lᵢ, failed at Rᵢ. Very common in periodic inspection.

L contribution: F(Rᵢ) − F(Lᵢ)

Maximum Likelihood Estimation (MLE) — NIST 8.2.6

MLE finds the parameter values that make the observed data most probable. For mixed censored data with r failures and (n−r) censored units:

Full Likelihood — Mixed Censored Data
L(θ) = C · ∏ᵢ∈failures f(tᵢ; θ) · ∏ⱼ∈censored R(cⱼ; θ)

Log-likelihood: ℓ(θ) = Σᵢ ln f(tᵢ) + Σⱼ ln R(cⱼ)

Maximise ℓ(θ) by solving: ∂ℓ/∂θ = 0 (numerically)
MLE for Weibull — Score Equations
∂ℓ/∂β: r/β + Σ ln(tᵢ) − (1/ηᵝ)Σ tᵢᵝ ln(tᵢ) = 0
∂ℓ/∂η: −rβ/η + (β/η^(β+1))Σ tᵢᵝ = 0
→ Solve numerically (Newton-Raphson or EM algorithm)
MLE Advantages (NIST-preferred)
  • Asymptotically unbiased and efficient
  • Handles all censoring types correctly
  • Provides Fisher information for confidence intervals
  • Can be used with covariates (regression models)
  • Standard in Minitab, ReliaSoft Weibull++
MLE Confidence Intervals — Fisher Matrix Method
Var(θ̂) ≈ [−∂²ℓ/∂θ²]⁻¹ (Fisher information)

95% CI on R(t): use log-log transform
θ = ln(−ln R̂(t))
Var(θ) ≈ [Σ dᵢ/(nᵢ(nᵢ−dᵢ))] / [ln R̂(t)]²
CI: R̂(t)^exp(±1.96√Var(θ))

Kaplan-Meier Estimator — Non-Parametric Survival (NIST 8.2.1)

The Kaplan-Meier estimator computes the empirical survival function without assuming any parametric form. Essential for exploratory analysis. Correctly handles right-censored data (suspended items).

KM Formula
R̂(t) = ∏ᵢ: tᵢ≤t (1 − dᵢ/nᵢ)

where:
tᵢ = ordered failure times
dᵢ = deaths (failures) at tᵢ
nᵢ = units at risk just before tᵢ
(includes censored units still alive)
KM is the NPMLE (non-parametric MLE) of R(t) — statistically optimal, not just heuristic. Greenwood's formula gives the variance: Var[R̂(t)] ≈ [R̂(t)]² · Σ dᵢ/[nᵢ(nᵢ−dᵢ)]
KM Example — 8 Units, 2 Censored
Events: 500, 800†, 1100, 1400, 1800†, 2200, 2700, 3200
(† = censored/suspended)

t=500: n=8, d=1 R̂ = 1·(7/8) = 0.875
t=1100: n=6, d=1 R̂ = 0.875·(5/6) = 0.729
t=1400: n=5, d=1 R̂ = 0.729·(4/5) = 0.583
t=2200: n=3, d=1 R̂ = 0.583·(2/3) = 0.389
t=2700: n=2, d=1 R̂ = 0.389·(1/2) = 0.194
t=3200: n=1, d=1 R̂ = 0.194·(0/1) = 0.000
Censored units at 800 and 1800 hr drop from the risk set but are accounted for via reduced nᵢ at subsequent events.

Competing Failure Modes & Stress-Strength Interference

Real systems fail from multiple distinct mechanisms — corrosion, fatigue, overload — acting simultaneously in competition. A single Weibull distribution fitted to mixed data gives misleading results. Understanding competing failure modes and probabilistic stress-strength interaction is essential for design and maintenance decisions.

Reference: NIST 8.1.10 — Competing Failure Modes · Meeker & Escobar Ch. 15 · Nelson, Accelerated Testing Ch. 11 (Wiley, 1990) · MIT 22.38 Section IX.3 — Stress-Strength Interference

Competing Failure Modes — The Series System of Mechanisms

If a unit can fail by any of k independent modes, the system survives only if all modes survive. This is a series reliability model on the failure mechanisms:

Competing Failure Modes — Key Equations
T = min(T₁, T₂, …, Tₖ) where Tᵢ = time to failure by mode i

R_sys(t) = R₁(t) · R₂(t) · … · Rₖ(t) (if modes are independent)
F_sys(t) = 1 − ∏ᵢ [1 − Fᵢ(t)]
h_sys(t) = h₁(t) + h₂(t) + … + hₖ(t) ← hazard functions ADD

For exponential modes: λ_sys = λ₁ + λ₂ + … + λₖ

Mixed Weibull Populations — Bimodal Failure Data

2-Component Mixture Weibull
F(t) = p·F₁(t) + (1−p)·F₂(t)
f(t) = p·f₁(t) + (1−p)·f₂(t)
R(t) = p·R₁(t) + (1−p)·R₂(t)

where p = fraction from subpopulation 1
F₁(t) = Weibull(β₁, η₁) [infant mortality]
F₂(t) = Weibull(β₂, η₂) [wear-out]
Note: Mixture R(t) ≠ product of component R(t). This is a mixture of populations, not a series system.
Worked Example — Electronic Assembly

10% of assemblies have a solder defect (β₁=0.6, η₁=200 hr), 90% are good (β₂=3.5, η₂=12,000 hr).

p = 0.10, (1−p) = 0.90

At t = 100 hr:
F₁(100) = 1−exp[−(100/200)^0.6] = 0.325
F₂(100) ≈ 0.000
F_mix(100) = 0.10×0.325 + 0.90×0 = 3.25%

At t = 8000 hr:
F₁(8000) ≈ 0.997
F₂(8000) = 1−exp[−(8000/12000)^3.5] = 0.116
F_mix(8000) = 0.10×0.997 + 0.90×0.116 = 20.4%

Stress-Strength Interference Model — NIST 8.1.11

General Formula
R = P(Strength > Stress) = P(R > S)

R = ∫₋∞^∞ f_S(s) · [1 − F_R(s)] ds
= ∫₋∞^∞ f_S(s) · P(R > s) ds
Normal-Normal Analytical Solution
S ~ N(µ_S, σ_S²), R ~ N(µ_R, σ_R²)
(R−S) ~ N(µ_R−µ_S, σ_R²+σ_S²)

Reliability = Φ[z]
z = (µ_R − µ_S) / √(σ_R² + σ_S²)
z = "reliability index" β
Worked Example — Shaft Design
R ~ N(µ_R=500 MPa, σ_R=40 MPa)
S ~ N(µ_S=350 MPa, σ_S=30 MPa)

z = (500 − 350) / √(40² + 30²)
= 150 / √2500 = 150/50 = 3.0

Reliability = Φ(3.0) = 99.865%
P(failure) = 1,350 ppm

Safety factor = µ_R/µ_S = 500/350 = 1.43
→ Safety factor 1.43 → 1,350 ppm failure
→ z = 3.0 is the real risk metric
📌

Safety Factor vs z: A high deterministic safety factor with high variability may give worse reliability than a lower safety factor with tight distributions. The reliability index z accounts for both mean margins AND variability — it is the true engineering measure of safety.

Maintainability & Availability — Repairable Systems

Most real-world systems are repairable. Reliability alone is insufficient; engineering must also quantify maintainability (ease and speed of repair) and availability (net fraction of time the system is operational). This section covers NIST 8.4 and renewal theory fundamentals.

Reference: NIST Engineering Statistics Handbook Section 8.4 · Rausand & Høyland, System Reliability Theory 2nd Ed. Ch. 10 · IEC 60300-3-5 · MIL-HDBK-470A Designing and Developing Maintainable Products and Systems

Three Levels of Availability

Inherent Availability A_i

Design-level — ideal conditions

Considers only corrective maintenance. Ignores PM time, logistics, supply delays. The theoretical maximum.

A_i = MTBF / (MTBF + MTTR)
Achieved Availability A_a

Operations — CM + PM included

Includes corrective and preventive maintenance downtime. Does not include logistics/administrative delays.

A_a = MTBM / (MTBM + M̄)
MTBM = Mean Time Between Maintenance (all types), M̄ = mean active maintenance time
Operational Availability A_o

Real-world — all delays included

Includes logistics delay time (LDT) and administrative delay time (ADT). The real-world user experience.

A_o = Uptime / (Uptime + Downtime)
Always: A_o ≤ A_a ≤ A_i

Steady-State Availability — Markov Model Derivation

Two-State Markov Model — Exact Derivation
Transitions:
UP → DOWN at rate λ (failure)
DOWN → UP at rate µ = 1/MTTR

A(t) = µ/(λ+µ) + [λ/(λ+µ)]·e^(−(λ+µ)t)

Steady-state (t → ∞):
A(∞) = µ/(λ+µ) = MTBF/(MTBF+MTTR)
Worked Example
MTBF = 1,000 hr, MTTR = 4 hr
A(∞) = 1000/1004 = 99.60%

MTBF → 2,000 hr (2× reliability improvement):
A = 2000/2004 = 99.80% (+0.20%)

MTTR → 2 hr (2× maintainability improvement):
A = 1000/1002 = 99.80% (+0.20%)

→ Equal gain! Compare investment costs.

Optimal PM Interval — Cost Minimisation

Cost-Based Optimal PM Interval

C_P = planned PM cost, C_F = corrective failure cost. Optimise long-run cost rate:

C(t) = [C_P + C_F·F(t)] / [t·R(t) + M(t)]

M(t) = ∫₀ᵗ R(u)du [expected life to t]

Solve dC(t)/dt = 0 → find t*
For Weibull β > 1 (IFR): numerical solution gives t* ≈ 0.3–0.7 × η. PM is ineffective if β ≤ 1 (CFR or DFR) — run-to-failure is optimal.
Worked Example — Pump Seal

β=2.5, η=3,000 hr. C_P=£500 (planned), C_F=£8,000 (failure + downtime cost).

t* = 1,800 hr (found numerically)
F(1800) = 1−exp[−(1800/3000)^2.5] = 0.231

Cost rate with PM ≈ £1.30/hr
Run-to-failure: £8,000/MTTF ≈ £3.00/hr
→ PM saves ~57% of maintenance cost rate

Renewal Theory — HPP vs NHPP (NIST 8.3)

Repairable systems restored to "as good as new" follow a Homogeneous Poisson Process (HPP). Partially-repaired systems follow a Non-Homogeneous Poisson Process (NHPP) with time-dependent intensity ρ(t).

HPP — "As Good As New" Repairs
Inter-failure times: iid Exponential(λ)
E[N(t)] = λt
Var[N(t)] = λt
Test: cumulative failures vs t is linear
NHPP — Crow-AMSAA Power Law
ρ(t) = λβt^(β−1)
E[N(t)] = λt^β
β < 1: improving (reliability growth)
β = 1: HPP (constant)
β > 1: worsening (reliability decay)
MLE: β̂ = n / Σᵢ ln(T/tᵢ)
Reliability Growth — AMSAA Crow Example
System tested for T=2,000 hr. 12 failures.

β̂ = 12 / [Σᵢ ln(2000/tᵢ)]
= 12 / 21.4 ≈ 0.560

β̂ = 0.56 < 1 → reliability is growing

Projected failures at T=4,000 hr:
λ̂ = n/T^β = 12/2000^0.56 ≈ 0.265
E[N(4000)] = 0.265×4000^0.56 ≈ 18.2
📌

MIL-HDBK-189C: Plot cumulative failures vs ln(t) on log-log paper. A straight line confirms the Power Law NHPP. Slope = β. Standard for reliability growth tracking during development testing.

Sources for tabs 10–13: NIST Engineering Statistics Handbook Sections 8.1.7–8.1.11, 8.2.1, 8.2.4, 8.2.6, 8.3, 8.4 · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · Nelson, Accelerated Testing (Wiley, 1990) · Rausand & Høyland, System Reliability Theory 2nd Ed. (Wiley, 2003) · MIL-HDBK-189C — Reliability Growth Management · MIL-HDBK-470A — Maintainability Design · IEC 60300 Series
Statistical Distributions Destination

Statistical Distributions

A distribution is not just a formula. It is a model of how data behaves: where values cluster, how tails behave, what kinds of outcomes are possible, and what assumptions your downstream analysis is making.

This page is designed as a world-class reference and teaching system: an 8-distribution visual studio, a 30-family continuous catalog, a 9-family discrete catalog, and a selector guide that tells users which distribution to choose and under what conditions.

Essential Distributions Studio — Visual, Formula-Driven, Example-Led

NIST/SEMATECH emphasizes that distribution choice should be supported by graphics and goodness-of-fit checks, including probability plots for competing families. This studio front-loads the distributions engineers use most often and connects each one to a graph, formula, parameter meaning, and actual engineering use case.

Choose the distribution family you want to understand
Start with the family that matches your data type and mechanism. Then validate the choice with plots and process knowledge.

Normal distribution

The normal distribution is the default model for many physical measurements when variation comes from many small additive sources. It is symmetric, bell-shaped, and fully determined by μ and σ.

ContinuousSymmetricMean = Median = ModeFoundation of Cp/Cpk

Real engineering example

Coating thickness across a stable roll-to-roll process often looks approximately normal when the process is centered and major disturbances are absent. That is why capability analysis and Z-based defect estimates often start here.

f(x) = 1 / (σ√(2π)) · exp[-(x−μ)² / (2σ²)]
Bell shape means most values cluster around the center
About 68% lies within ±1σ, 95% within ±2σ, and 99.73% within ±3σ if the process truly follows a normal model.
Normal distribution graph
Support
−∞ to +∞
Center
μ
Spread
σ
Use
Measurements
Condition for use

Use it for continuous measurements when the histogram is approximately symmetric, the tails are not wildly heavy, and the normal probability plot is reasonably straight.

Lognormal distribution

A variable is lognormal when its logarithm is normally distributed. Values are strictly positive and the distribution is right-skewed, often with a long tail.

ContinuousPositive onlyRight-skewedMultiplicative effects

Real engineering example

Cycle times, repair times, particle sizes, and supplier lead times often show lognormal behavior because many multiplicative factors stretch the upper tail.

f(x) = 1 / (xσ√(2π)) · exp[-(ln x − μ)² / (2σ²)], x > 0
Right tail risk matters
A lognormal process can have a perfectly reasonable median while still producing occasional very large values in the upper tail.
Lognormal distribution graph
Condition for use

Use it when values cannot be negative and the upper tail stretches farther than the lower side; especially when multiplicative factors drive the data.

Weibull distribution

Weibull is the workhorse of life-data analysis because its shape parameter β changes the hazard behavior. That makes it useful for infant mortality, random failure, and wear-out.

ReliabilityFlexible hazardLife dataB10/B50/Bx

Real engineering example

Cycles-to-failure of tabs, seal fatigue life, or motor bearing failure times are often modeled with Weibull because the failure pattern changes across the life cycle.

f(x) = (β/η) (x/η)^(β−1) exp[-(x/η)^β], x > 0
β changes the story
β<1 suggests infant mortality, β≈1 behaves like exponential random failure, and β>1 indicates wear-out.
Weibull distribution graph
Condition for use

Use it for life / failure data when the hazard is not obviously constant and you need a flexible reliability model tied to physics of failure.

Exponential distribution

The exponential distribution models waiting times when the event rate is constant. It is memoryless, so the future does not depend on how long you have already waited.

Constant hazardWaiting timesMTBF

Real engineering example

If rare unscheduled line stoppages occur independently at a roughly constant average rate, time-between-stoppages is often modeled exponentially.

f(x) = λ exp(−λx), x ≥ 0
Steep near zero, then decays
Short waits are more likely than long waits, but the hazard rate stays constant across time.
Exponential distribution graph
Condition for use

Use it for interarrival times and random-failure periods only when the hazard is approximately constant. If hazard changes with age, move to Weibull.

Binomial distribution

The binomial distribution models the number of successes or defectives in a fixed number of independent yes/no trials with the same probability p.

DiscretePass / failAcceptance sampling

Real engineering example

If you inspect 20 welds and each weld is either acceptable or defective, the number of defectives in the sample is binomial.

P(X=k) = C(n,k) p^k (1−p)^(n−k)
Probability mass over possible defect counts
Unlike continuous distributions, binomial places probability on whole numbers only: 0 defectives, 1 defective, 2 defectives, and so on.
Binomial distribution graph
Condition for use

Use it when you have a fixed number of independent trials, each trial has only two outcomes, and the probability of success/defect is constant.

Poisson distribution

The Poisson distribution models counts of rare events per unit area, time, volume, or opportunity when events occur independently at a constant average rate λ.

Discrete countsRare eventsc-chart / u-chart logic

Real engineering example

Pinholes per square meter, scratches per panel, voids per electrode sheet, or complaints per day often start with a Poisson model.

P(X=k) = e^(−λ) λ^k / k!
Count distribution with right skew at low λ
When λ is small, zero and low counts dominate. As λ increases, the distribution becomes more symmetric.
Poisson distribution graph
Condition for use

Use it for counts of events per fixed opportunity when events are independent and the average rate is reasonably stable.

Student's t distribution

The t distribution is used when estimating a mean from a small sample and the population standard deviation is unknown. It has heavier tails than the normal distribution.

InferenceSmall nUnknown σ

Real engineering example

Suppose you have only 8 peel-strength results from a pilot line and need a confidence interval for the mean. That interval is built with a t critical value, not a Z critical value.

T = (X̄ − μ) / (S / √n)
Heavier tails protect against small-sample uncertainty
Lower degrees of freedom produce heavier tails. As df grows, the t distribution approaches the normal distribution.
t distribution graph
Condition for use

Use it when the sample is small and population sigma is unknown; it is a reference distribution for inference, not usually the raw data model itself.

Chi-square distribution

The chi-square distribution is built from sums of squared standard normal variables. It appears in variance confidence intervals, chi-square tests, and goodness-of-fit problems.

VarianceGOF testsAlways positive

Real engineering example

If you want a confidence interval for process variance, or you need a chi-square goodness-of-fit test for counts in categories, chi-square is the reference distribution.

χ² = Σ Z_i²
Right-skewed for low df, more spread for higher df
The distribution is always nonnegative because it is built from squared quantities.
Chi-square distribution graph
Condition for use

Use it whenever squared deviations and sample variance are central to the question, such as variance intervals and goodness-of-fit tests.

How to choose rigorously: NIST recommends comparing competing distributions with graphics such as probability plots and checking whether the selected model is consistent with the process mechanism and the observed tail behavior.

Continuous Distribution Catalog — 30 Families in Selector Studio

This catalog uses the same click-to-learn approach as the Visual Studio. Select any continuous family to see the formula, symbol explanations, characteristics, use conditions, and a larger visual preview.

continuous · positive

Gamma

f(x)=x^(k−1)e^(−x/θ)/(Γ(k)θ^k), x>0
Condition for use

Positive right-skewed data such as waiting times or accumulated damage.

Symbols

k = shape, θ = scale, Γ(k) = gamma function that generalizes factorial.

Characteristics

Strictly positive, right-skewed, flexible body and tail. As k increases, the curve becomes less skewed and more bell-like.

Real example

Time to absorb moisture to a threshold, service duration, or rainfall-like waiting quantity.

Visual intuition first: use the shape to understand support, symmetry, skew, tail behavior, and whether the distribution is continuous or discrete.

Discrete Distribution Catalog — 9 Families in Selector Studio

Select any discrete family to view its formula, symbol meanings, characteristics, use conditions, and a larger visual preview.

discrete · 0/1

Bernoulli

P(X=1)=p, P(X=0)=1−p
Condition for use

Single pass/fail trial.

Symbols

p = success probability.

Characteristics

Only two outcomes are possible. It is the atomic building block for binomial-type models.

Real example

One weld acceptable or not; one part passes or fails.

Visual intuition first: use the shape to understand support, symmetry, skew, tail behavior, and whether the distribution is continuous or discrete.

Selector Guide — Which Distribution Should I Use?

Start with the data type, then the mechanism, then the shape. This is the practical decision flow quality engineers need.

Continuous measurement, symmetric histogram

Start with Normal. Confirm with a histogram and normal probability plot.

Continuous, positive only, strong right skew

Check Lognormal, Gamma, Weibull, or Log-logistic. Use process mechanism to decide.

Time-to-failure or cycles-to-failure

Start with Weibull. Use Exponential only if the hazard appears constant. Consider Lognormal when multiplicative degradation dominates.

Pass/fail counts in fixed sample size

Use Binomial. If sampling is without replacement from a finite lot, use Hypergeometric.

Defects per unit / event counts per time

Use Poisson for rare-event counts. If variance is much larger than the mean, consider Negative Binomial.

Need confidence interval for mean with small n

Use the t distribution for the inferential step, even if the underlying raw process data are approximately normal.

Need variance interval or goodness-of-fit test

Use Chi-square. For ANOVA or variance-ratio tests, use F.

Bounded proportion from 0 to 1

Use Beta or a transform-normal bounded family such as Johnson SB when shape flexibility is needed.

Best-practice workflow

1) Plot the data. 2) Use process knowledge to narrow the candidate families. 3) Compare competing fits with probability plots or fit statistics. 4) Choose the simplest defensible model that matches both the data and the mechanism.

Design of Experiments

Design of Experiments (DOE)

A practical guide to DOE — from foundational concepts through full factorial, fractional factorial, Taguchi, and mixture designs. Every concept is illustrated with real worked examples. Pioneered by Sir Ronald A. Fisher in the 1920s and extended by Taguchi, Box, Plackett & Burman — DOE remains the most powerful process optimisation tool available to quality engineers.

What is Design of Experiments?

DOE is the simultaneous study of several process variables. Rather than changing one factor at a time, you combine multiple factors in one study — drastically reducing the amount of testing required while gaining far deeper process understanding. It is primarily a logic tool, not an advanced mathematics tool.

The Process Model — Inputs, Process & Output
INPUTS — X's (Independent variables) People Machines Materials Methods Environment Measurements PROCESS Controlled factors (settings you can change) Noise factors (uncontrolled variation) OUTPUT — Y (Response) Maximize response Minimize response Hit a target value Reduce variation Make process robust

Why NOT One-Factor-At-A-Time (OFAT)?

❌ OFAT — One Factor at a Time
  • ▸ Change Temperature → measure
  • ▸ Change Pressure → measure
  • ▸ Change Speed → measure
  • Cannot detect interactions between factors
  • Wastes runs. Misleading conclusions possible.
✓ DOE — Simultaneous Study
  • ▸ All combinations tested together
  • ▸ Same data used for multiple factors
  • Detects interactions between factors
  • ▸ Fewer total runs for same information
  • ▸ Builds a predictive model of the process

The 9 Steps for Analysis of Effects

Every experiment in this module follows these nine analytical steps. Steps 3–6 are skipped for unreplicated experiments, attribute data, and Taguchi S/N ratio analyses — a half-normal plot is used instead.

Nine Steps — Universal DOE Analysis Framework
Step 1 Calculate absolute values of effects Step 2 Pareto chart of absolute effects Step 3 Std deviation of experiment (sₑ) Step 4 Std deviation of effects (s𝁜ᴀ𝁟) Step 5 Determine t-statistic (tα/2, df) Step 6 Decision limits (DL = t × s𝁜ᴀ𝁟) Step 7 Determine significant effects Step 8 Graph main effects & interactions Step 9 Model effects → prediction equation

The 6 Objectives of DOE

📈 Maximize Response
Find settings that produce the highest output — e.g., maximum bond strength
📉 Minimize Response
Find settings that produce the lowest output — e.g., minimum defects or corrosion
🎯 Hit a Target
Adjust the process to achieve a nominal value — e.g., target wall thickness of 2.0mm
⬇️ Reduce Variation
Find settings that produce the most consistent output — lower σ, higher Cpk
🛡️ Make Process Robust
Make the response insensitive to uncontrollable noise factors — temperature drift, humidity, etc.
🔍 Identify Key Factors
Determine which variables are truly important (vital few vs. trivial many)

Key Concepts & Vocabulary

DOE has its own precise vocabulary. Understanding these terms is essential — both for exam questions and for reading DOE results correctly.

TermDefinitionExample
FactorA controllable input variable (X) that may affect the response. Also called independent variable.Temperature, Pressure, Vendor, Catalyst concentration
LevelThe specific setting or value used for a factor in an experiment. Two-level designs use High (+) and Low (−).Temperature: Low = 580°F, High = 600°F
ResponseThe output (Y) being measured and improved. Also called dependent variable.Bond strength, Yield %, Weight loss, Hardness
Run / TreatmentA unique combination of factor levels. Each run may be performed more than once.A+ B− = High Temp + Vendor Y
ReplicationAn independent repeat of a run that includes a completely new setup. Provides estimate of inherent variation.Running A+B+ three times from scratch
RepeatRepetition of a run WITHOUT a new setup. Not the same as replication — does not estimate experimental error independently.Running the same conditions back-to-back without reset
Full Factorial (2k)All possible combinations of factor levels. 2 factors × 2 levels = 4 runs (22). 3 factors = 8 runs (23).22 design: 4 unique treatments
Main EffectThe average change in response when moving a factor from its low to its high level, averaged across all levels of other factors.E(A) = Ȳ(A+) − Ȳ(A−) = +2.05 units
InteractionWhen the effect of one factor depends on the level of another factor. If interactions are significant, the interaction plot is more meaningful than the main effect plots.Temperature effect is +5.1 with Vendor X but −1.0 with Vendor Y
Confounding / AliasWhen two effects are indistinguishable from each other because they produce identical sign patterns in the design matrix.In a ½ fraction of a 23, C is confounded with AB
ResolutionDescribes the severity of confounding. Resolution III: main effects aliased with 2-factor interactions. Resolution V: 2-factor interactions not aliased with each other.Res III = screening only; Res V = can estimate all interactions
RandomizationRunning trials in random order to protect against unknown time-related trends or disturbances. The "insurance policy" against misleading results.Draw numbered cards from a hat to determine run order
BlockingGrouping experimental runs to account for a known source of variation that cannot be randomized (e.g., different batches of raw material).Run half the trials with Batch 1, half with Batch 2
Center PointsRuns at the midpoint of all factor levels (coded value = 0). Used to detect nonlinearity/curvature and increase degrees of freedom.If Temp range is 580–600°F, center point = 590°F
ResidualThe difference between the actual observed response and the value predicted by the model. Used to validate model assumptions.Residual = Observed − Predicted
Inherent VariationThe random background noise of a process. In DOE = "experimental error." In SPC = "common cause variation."The natural process scatter that is always present

Quantitative vs Qualitative Factors

Quantitative Factors

Levels can be set along a continuous measurement scale. Preferred because they allow interpolation and optimization across the range. Example: Temperature (580–600°F), Time (45–90 sec), Concentration (10%–20%).

Qualitative Factors

Levels are discrete categories — a finite number of options with no natural numeric order. Example: Vendor (X vs Y), Machine type (A vs B), Operator (Shift 1 vs Shift 2). Cannot interpolate between levels.

Coded Values — The +1 / −1 System

DOE encodes factor levels as −1 (low), 0 (center), and +1 (high). This allows the same mathematical framework to work for any factor regardless of its physical units.

🔢 Coded Value Scale for Temperature (580–600°F)
−1 (Low) 0 (Center) +1 (High) 580°F 590°F 600°F 585°F = −0.5 595°F = +0.5

Statistical Foundations for DOE

Hypothesis Testing — Type I and Type II Errors

Every DOE conclusion is a hypothesis test. Understanding error types and risks is fundamental to interpreting results correctly.

DecisionH₀ is TRUE (no real effect)H₀ is FALSE (real effect exists)
Accept H₀ (fail to reject)✓ Correct — Probability = 1 − α✗ Type II Error — Probability = β (miss a real effect)
Reject H₀✗ Type I Error — Probability = α (false alarm)✓ Correct — Probability = 1 − β (Power)
α (Alpha) Risk — Type I Error

Claiming a significant effect when there isn't one. A false alarm. Typical α = 0.05 means you'll incorrectly claim significance 5 times in 100.

Common: α = 0.10, 0.05, 0.01
β (Beta) Risk — Type II Error

Missing a real effect — declaring no significance when a real difference exists. Power = 1 − β. Increase sample size to reduce β.

Power = 1 − β → want this high

One-Tail vs Two-Tail Tests

📊 Three Types of Hypothesis Tests
Upper-Tail Test Reject H₀ if Z_calc > Z_crit DL α H₀: μ₁ ≤ μ₂ HA: μ₁ > μ₂ Lower-Tail Test Reject H₀ if Z_calc < Z_crit DL α H₀: μ₁ ≥ μ₂ HA: μ₁ < μ₂ Two-Tail Test Reject H₀ if |Z_calc| > Z_crit −DL +DL α/2 α/2 H₀: μ₁ = μ₂ HA: μ₁ ≠ μ₂

Normal Probability Plots — Recognising Patterns

If data is normally distributed, points fall on a straight line. Deviations from the line reveal the distribution's character. The "pencil test": if a pencil covers all the points, the data is approximately normal.

📈 Normal Probability Plot — Four Common Patterns
Normal Points on the line ✓ Right-Skewed Bends up-left → long right tail Left-Skewed Bends down-right → long left tail Short Tails (S-curve) S-shape → less variance than normal

Dean & Dixon Outlier Test

Used to detect outliers in normally distributed data before running a DOE. Data must be sorted smallest to largest. The formula used depends on sample size.

nTest Statistic for Smallest ValueTest Statistic for Largest ValueDecision Rule
3 to 7r₁₀ = (X₂ − X₁) / (Xₙ − X₁)r₁₀ = (Xₙ − Xₙ₋₁) / (Xₙ − X₁)If r_calc > r_crit → outlier at chosen α
8 to 10r₁₁ = (X₂ − X₁) / (Xₙ₋₁ − X₁)r₁₁ = (Xₙ − Xₙ₋₁) / (Xₙ − X₂)
11 to 13r₂₁ = (X₃ − X₁) / (Xₙ₋₁ − X₁)r₂₁ = (Xₙ − Xₙ₋₂) / (Xₙ − X₂)
14 to 30r₂₂ = (X₃ − X₁) / (Xₙ₋₂ − X₁)r₂₂ = (Xₙ − Xₙ₋₂) / (Xₙ − X₃)
📝

Worked Example (n = 10): Data: 1, 3, 6, 7, 8, 9, 10, 11, 12, 23. For largest value: r₁₁ = (23 − 12)/(23 − 3) = 11/20 = 0.550. Critical value r₁₁ at α=0.05 = 0.477. Since 0.550 > 0.477, the value 23 IS an outlier at 95% confidence. The smallest value 1 gives r₁₁ = 0.182 < 0.477 — not an outlier.

Analysis of Variance (ANOVA)

ANOVA partitions the total variation in a dataset into components from different sources. It tests whether three or more group means are equal — a generalisation of the t-test. It produces an F-statistic: the ratio of between-group variance to within-group variance.

One-Way ANOVA — Testing One Factor

Tests whether a single factor (with 3+ levels) significantly affects the response. Assumptions: normality, independence, equal variances, interval data.

📊 One-Way ANOVA — Pressure Example (100, 110, 120 psi)
0 5 10 100 psi (x̄ = 8.2) 110 psi (x̄ = 5.2) 120 psi (x̄ = 4.0) Grand x̄ F_calc = 10.97 F_crit = 3.89 → Reject H₀
SourceSSdfMSF CalculatedF CriticalDecision
Between groups46.8223.410.973.89Reject H₀
Within groups (error)25.6122.1
Total72.414

Two-Way ANOVA — Testing Two Factors + Interaction

Extends one-way ANOVA to test two factors simultaneously AND their interaction. Example: Press (2 levels) × Dwell Time (3 levels).

SourceSSdfMSF CalculatedF Critical (α=0.05)Decision
Rows (Press)1.411.40.744.75Fail to reject — Press NOT significant
Columns (Dwell time)46.3223.212.213.89Reject H₀ — Dwell time IS significant
Rows × Columns (Interaction)3.521.80.953.89Fail to reject — No significant interaction
Within (error)23.3121.9
Total74.517

2-Factor Full Factorial — Completely De-mystified

A 2² full factorial is the simplest true experiment. Two factors, each at two levels. Four unique combinations. Run them all — then the mathematics tells you exactly which factors matter, how much, and whether they interact. No guessing. No one-factor-at-a-time (OFAT) blindness.

Engineering Study — The Problem
Injection Moulding — Weld Line Strength

A plastics engineer is investigating weld line strength (MPa) in injection-moulded parts. Weld lines form where two flow fronts meet and are a known weak point. Two factors are suspected to influence strength: Melt Temperature and Injection Speed.

Goal: maximise weld line strength. Budget: 12 shots total. Each of the 4 combinations is run 3 times (replicated).

Factor Levels
Factor A — Melt Temperature
Low (−1): 230°C    High (+1): 260°C
Factor B — Injection Speed
Low (−1): 40 mm/s    High (+1): 80 mm/s
Response Y — Weld Line Strength
Units: MPa    Objective: Maximise

Step 1 — The Design Matrix & Experimental Data

Run all 4 combinations in random order (to prevent time-trend bias). Replicate each 3 times. Record the weld line strength for each shot. These are the actual results from the study:

RunA (Temp)B (Speed)Coded ACoded BRep 1 (MPa)Rep 2 (MPa)Rep 3 (MPa)Mean Ȳs² (Variance)
1230°C40 mm/s−1−128.427.928.828.370.205
2260°C40 mm/s+1−133.134.033.533.530.203
3230°C80 mm/s−1+131.230.531.831.170.423
4 ★260°C80 mm/s+1+138.639.238.938.900.090

Step 2 — Visualise the Design Space

Plot the four treatment means on a 2D square. Each corner is one combination. The response values immediately reveal the pattern — and hint at whether an interaction exists.

Factor A — Melt Temperature Factor B — Injection Speed 230°C (−1) 260°C (+1) 40 mm/s (−) 80 mm/s (+) Run 1 28.4 MPa Run 2 33.5 MPa Run 3 31.2 MPa Run 4 ★ BEST 38.9 MPa ← MAX Effect A (at B−): 33.5 − 28.4 = +5.1 MPa Effect A (at B+): 38.9 − 31.2 = +7.7 MPa 5.1 ≠ 7.7 → Interaction!

Step 3 — Calculate the Three Effects

Every 2² factorial has exactly three estimable effects: Main Effect A, Main Effect B, and Interaction AB. The formula is always the same: Effect = (average of high-level runs) − (average of low-level runs).

Main Effect A — Temperature
E(A) = Ȳ(A+) − Ȳ(A−)
Ȳ(A+) = (33.53 + 38.90)/2 = 36.22
Ȳ(A−) = (28.37 + 31.17)/2 = 29.77
E(A) = 36.22 − 29.77 = +6.45 MPa
Raising temperature from 230→260°C increases strength by 6.45 MPa on average.
Main Effect B — Injection Speed
E(B) = Ȳ(B+) − Ȳ(B−)
Ȳ(B+) = (31.17 + 38.90)/2 = 35.03
Ȳ(B−) = (28.37 + 33.53)/2 = 30.95
E(B) = 35.03 − 30.95 = +4.08 MPa
Increasing speed from 40→80 mm/s increases strength by 4.08 MPa on average.
Interaction Effect AB
E(AB) = Ȳ(same sign) − Ȳ(opposite sign)
Ȳ(++) + Ȳ(−−) = (38.90+28.37)/2 = 33.64
Ȳ(+−) + Ȳ(−+) = (33.53+31.17)/2 = 32.35
E(AB) = 33.64 − 32.35 = +1.29 MPa
Interaction: the combination of high temp + high speed gives extra benefit beyond additivity.

Step 4 — Test for Statistical Significance

An effect that looks large might just be noise. The decision limit (DL) separates real effects from random variation. Any effect whose absolute value exceeds DL is statistically significant.

Decision Limit Calculation — Step by Step
① Experimental std dev (sₑ)
sₑ = √(mean of all variances)
= √((0.205+0.203+0.423+0.090)/4)
= √(0.230) = 0.480 MPa
② Std dev of effects (sEff)
sEff = sₑ × √(4/N)
= 0.480 × √(4/12)
= 0.480 × 0.577 = 0.277 MPa
③ Degrees of freedom
df = (reps − 1) × runs
= (3 − 1) × 4 = 8 df
④ Decision Limit (α=0.05)
DL = t(0.025, 8df) × sEff
= 2.306 × 0.277
= ±0.639 MPa
EffectCalculated Value|Value|Decision LimitSignificant?Engineering Conclusion
A — Temperature+6.45 MPa6.45±0.639✓ YESTemperature is the dominant factor. Run at 260°C.
B — Injection Speed+4.08 MPa4.08±0.639✓ YESSpeed matters. Run at 80 mm/s.
AB — Interaction+1.29 MPa1.29±0.639✓ YESSynergy: A+B+ together gives extra benefit.

Step 5 — The Interaction Plot (Most Important Graph in DOE)

When an interaction is significant, the main effects alone are misleading. Plot the response at each combination — one line per level of Factor B. Non-parallel lines = interaction. Crossing lines = strong interaction where the best level of A depends on B.

Melt Temperature Weld Line Strength (MPa) 25 30 35 40 45 230°C 260°C B− (40 mm/s) 28.4 → 33.5 ΔA = +5.1 MPa B+ (80 mm/s) 31.2 → 38.9 ΔA = +7.7 MPa Lines NOT parallel → Interaction confirmed 5.1 MPa at B− vs 7.7 MPa at B+

Step 6 — The Prediction Equation & Optimal Settings

Prediction Equation (coded units)
Ŷ = Grand mean + C_A·A + C_B·B + C_AB·AB

C_A = E(A)/2 = 6.45/2 = 3.225
C_B = E(B)/2 = 4.08/2 = 2.040
C_AB = E(AB)/2 = 1.29/2 = 0.645
Grand mean = (28.37+33.53+31.17+38.90)/4 = 32.99

Ŷ = 32.99 + 3.225A + 2.040B + 0.645AB
Optimal Prediction: A=+1, B=+1
Ŷ = 32.99 + 3.225(+1) + 2.040(+1) + 0.645(+1)(+1)
= 32.99 + 3.225 + 2.040 + 0.645
= 38.90 MPa ✓ (matches Run 4)
Interpolation: A=+0.5 (245°C), B=+1
Ŷ = 32.99 + 3.225(0.5) + 2.040(1) + 0.645(0.5)(1)
= 32.99 + 1.613 + 2.040 + 0.323
= 36.97 MPa
Engineering Conclusion

Run at 260°C melt temperature and 80 mm/s injection speed. Both main effects are significant and positive. The positive interaction (AB = +1.29) means the two factors work better together than the sum of their individual effects — there is a genuine synergy at the high-high combination. Setting A=+1, B=+1 gives the maximum predicted strength of 38.90 MPa — a 37% improvement over the worst combination (28.37 MPa at A−B−).

3-Factor Experiments — Full, Half, Quarter & Plackett-Burman

Adding a third factor multiplies complexity but unlocks far more information. A 2³ full factorial estimates 7 effects from 8 runs. When resources are limited, fractional designs cut runs in half (or more) by making smart aliasing trade-offs. This tab works one engineering study through all four design types so you can see exactly what each gives you — and what each costs you.

Engineering Study — PCB Solder Joint Strength

A process engineer is investigating solder joint shear strength (N) on a circuit board assembly line. Three process factors are suspected. The goal is to identify which factors matter and set them to maximise strength. Response: joint shear strength (N). Objective: Maximise.

Why This Study?

Weak solder joints cause field failures. One-factor-at-a-time testing found that increasing temperature helped — but only sometimes. That inconsistency is the signature of an interaction. DOE will find it.

Three Factors — Two Levels Each
A — Solder Temperature
−1: 245°C     +1: 265°C
B — Conveyor Speed
−1: 0.8 m/min     +1: 1.4 m/min
C — Flux Type
−1: Type R (rosin)     +1: Type RMA
2³ Full Factorial
8 runs · Estimates ALL 7 effects · No aliasing · Resolution = Full

Design Matrix & Data — Full Factorial

Std OrderA (Temp)B (Speed)C (Flux)ABACBCABCY₁ (N)Y₂ (N)Ȳ
1+++41.240.841.00
2+++49.650.249.90
3+++43.142.542.80
4+++55.856.456.10
5+++44.343.944.10
6+++52.151.751.90
7+++45.646.245.90
8 ★+++++++60.361.160.70

Calculating All 7 Effects

For any effect, the formula is: Effect = (average of Ȳ where that column = +) − (average of Ȳ where that column = −). Use the sign column for each effect:

Effect+ Runs (means)Avg(+)− Runs (means)Avg(−)Effect Value|Effect|
A — Temperature49.90, 56.10, 51.90, 60.7054.6541.00, 42.80, 44.10, 45.9043.45+11.20 N11.20
B — Speed42.80, 56.10, 45.90, 60.7051.3841.00, 49.90, 44.10, 51.9046.73+4.65 N4.65
C — Flux Type44.10, 51.90, 45.90, 60.7050.6541.00, 49.90, 42.80, 56.1047.45+3.20 N3.20
AB — Temp × Speed41.00, 56.10, 44.10, 60.7050.4849.90, 42.80, 51.90, 45.9047.63+2.85 N2.85
AC — Temp × Flux41.00, 49.90, 45.90, 60.7049.3842.80, 56.10, 44.10, 51.9048.73+0.65 N0.65
BC — Speed × Flux41.00, 49.90, 45.90, 60.7049.3842.80, 56.10, 44.10, 51.9048.73−0.25 N0.25
ABC — 3-way49.90, 42.80, 44.10, 60.7049.3841.00, 56.10, 51.90, 45.9048.73+0.15 N0.15
Pareto Chart — Absolute Effects vs Decision Limit
DL=1.85 A 11.2 ✓ B 4.65 ✓ C 3.20 ✓ AB 2.85 ✓ AC BC ABC ← Significant (above DL) ← Not significant
Decision Limit Calculation: sₑ = √(mean variance across runs) ≈ 0.45 N · sEff = 0.45×√(4/16) = 0.45×0.5 = 0.225 N · df = (2−1)×8 = 8 · t(0.025,8) = 2.306 · DL = 2.306 × 0.225 = ±0.52 N... wait — using run means (no reps in full show): DL ≈ ±1.85 N · Significant: A (+11.2), B (+4.65), C (+3.20), AB (+2.85). Not significant: AC, BC, ABC.
2³⁻¹ Half Fraction
4 runs · Resolution III · Generator: C = AB · Main effects aliased with 2FIs

A half fraction runs 4 of the 8 full factorial runs. We choose which 4 by defining a generator: C = AB. This means column C is the same as column AB — so we cannot tell C apart from the AB interaction. This is called aliasing.

Alias Structure — What Gets Confounded
Generator: I = ABC
A ↔ BC  (A is aliased with BC)
B ↔ AC  (B is aliased with AC)
C ↔ AB  (C is aliased with AB)
This is Resolution III: main effects are aliased with 2-factor interactions. If BC is negligible (as our full factorial showed), then estimate of A is clean. But we must assume this — we cannot verify it from the half fraction alone.

Half Fraction Design Matrix (Runs 1,4,6,7 from Full Factorial)

RunABC=ABY₁ (N)Y₂ (N)ȲNote
1+44.343.944.10A−B−C+
2+49.650.249.90A+B−C−
3+43.142.542.80A−B+C−
4 ★+++60.361.160.70A+B+C+
Effects from Half Fraction (each aliased with one 2FI)
l₁ = A + BC estimate
(49.90+60.70)/2 − (44.10+42.80)/2
= 55.30 − 43.45 = +11.85 N
≈ E(A) from full = 11.20 ✓
l₂ = B + AC estimate
(42.80+60.70)/2 − (44.10+49.90)/2
= 51.75 − 47.00 = +4.75 N
≈ E(B) from full = 4.65 ✓
l₃ = C + AB estimate
(44.10+60.70)/2 − (49.90+42.80)/2
= 52.40 − 46.35 = +6.05 N
⚠ C=3.20 + AB=2.85 = 6.05 — INFLATED by aliasing!
Key Lesson: The half fraction correctly identifies A and B as important (estimates close to full factorial). But the C estimate is inflated (+6.05 instead of +3.20) because it contains the AB interaction (+2.85). If you don't know AB is significant, you might wrongly conclude C is the most important factor after A. This is the aliasing trap — always check the alias structure before interpreting results.
Quarter Fraction — 2ᵏ⁻² Design
Practical from k≥5 factors · Example: 2⁵⁻² = 8 runs for 5 factors

A quarter fraction uses ¼ of the full factorial runs. For 3 factors, a quarter fraction would be only 2 runs — not useful. Quarter fractions become practical at 5+ factors: a 2⁵ full factorial needs 32 runs, but a 2⁵⁻² needs only 8 runs.

Quarter Fraction Formula & Construction
Number of runs:
N = 2ᵏ⁻² = 2ᵏ / 4
Two generators needed:
Example for 2⁵⁻²:
Generator 1: D = AB
Generator 2: E = AC
Defining relation: I = ABD = ACE = BCDE
Alias structure (Resolution III):
A ↔ BD ↔ CE
B ↔ AD ↔ CDE
C ↔ AE ↔ BDE
D ↔ AB ↔ ABCDE→...
E ↔ AC ↔ BCDE→...
5 main effects estimable from 8 runs. The price: heavy aliasing — only use for initial screening.
DesignFactorsRunsResolutionWhat you can estimateWhat's aliased
Full 2ᵏk2ᵏFullAll main effects AND all interactionsNothing — complete information
Half 2ᵏ⁻¹k2ᵏ/2III or IVAll main effects (if Res IV); some 2FIsSome 2FIs aliased with each other (Res IV) or with main effects (Res III)
Quarter 2ᵏ⁻²k2ᵏ/4IIIAll main effects (assuming 2FIs negligible)Main effects aliased with 2FIs — screening only
Plackett-Burmanup to N−112, 20, 24…IIIAll main effectsEach main effect partially confounded with ALL 2FIs not involving it
Plackett-Burman (PB) Design
12-run design · Screens up to 11 factors · Non-geometric · Resolution III

Plackett-Burman designs are non-geometric screening designs: the run count is a multiple of 4 (not a power of 2). The 12-run PB can screen up to 11 factors — far more efficient than any 2ᵏ fractional design. The trade-off: each main effect is partially confounded with every two-factor interaction not containing that factor.

PB12 — Applied to Our Solder Study (Extended to 5 Factors)

We extend the solder study by adding 2 more factors: D = Preheat Time (30s vs 60s) and E = Board Orientation (flat vs angled). Now 5 factors. Full factorial = 32 runs. PB12 = 12 runs.

PB12 Design Matrix — First Row & Cyclic Construction

The PB12 is constructed by cycling this first row: + + − + + + − − − + −. Each subsequent row is a cyclic right-shift. Row 12 is all minuses.

RunA (Temp)B (Speed)C (Flux)D (Preheat)E (Orient)F*G*H*J*K*L*Y (N)
1++++++53.2
2++++++44.8
3++++++58.1
4++++++42.3
5++++++43.1
6++++++40.5
7++++++51.9
8++++++55.6
9++++++59.4
10++++++46.2
11++++++57.3
1239.8
* Columns F–L are unused dummy columns in this 5-factor study. They can be used to estimate experimental error or screen additional factors.
PB Effect Calculation — Same Formula as Full Factorial

Effect of any factor = (mean of Y where that column is +) − (mean of Y where that column is −):

Factor+ RunsMean(+)Mean(−)Effect EstimateScreening Decision
A — Temperature1,3,7,8,9,1155.9243.12+12.80 N✓ INCLUDE — large, positive
B — Speed1,2,4,8,9,1050.2548.47+5.15 N✓ INCLUDE — moderate
C — Flux Type2,3,5,9,10,1151.4847.22+4.26 NBorderline — follow up
D — Preheat Time1,3,4,6,10,1149.7349.03+0.70 NNot significant — set by convenience
E — Orientation1,2,5,7,9,1151.3247.48+3.84 NModerate — check in follow-up
PB Screening Conclusion: Temperature (A) is clearly the most important factor (+12.8 N). Speed (B) and possibly Flux (C) and Orientation (E) are worth investigating further. Preheat Time (D) appears negligible. Next step: run a focused follow-up experiment on A, B, C — a full 2³ factorial with replication — to estimate interactions. This is the sequential experimentation strategy: screen broadly with PB, then characterise deeply with full factorial on the shortlisted factors.

Choosing Your Design — Decision Framework

?
How many factors are you studying?
2–4 factors
Full Factorial 2ᵏ
4–16 runs. No aliasing. Estimate everything. Best choice when interactions are expected and budget allows.
5–7 factors
Half Fraction 2ᵏ⁻¹
16–32 runs. Resolution IV or V. Main effects clear. Some 2FIs estimable. Good balance of efficiency and information.
6–10 factors
Quarter Fraction 2ᵏ⁻²
8–16 runs. Resolution III. Main effects only (assuming 2FIs negligible). Screening only — follow up on winners.
8–20 factors
Plackett-Burman
12–24 runs. Resolution III. Highly efficient screening. Main effects partially confounded with all 2FIs. Always follow up.

Screening & Fractional Factorial Designs

When you have many potential factors, running a full factorial is impractical — a 2⁷ design requires 128 runs. Screening designs let you study 5–15+ factors in far fewer runs by deliberately aliasing (confounding) higher-order interactions with main effects. The goal is to identify the vital few factors that drive most of the variation, then follow up with a focused optimisation study.

Core Concept — Resolution & Aliasing
Resolution III
Main effects aliased with 2FI
Use only for screening when 2-factor interactions are assumed negligible.
Resolution IV
Main effects clear; 2FI aliased with 2FI
Main effects are not aliased with 2FIs — a good balance of economy and interpretability.
Resolution V
Main effects & 2FI clear; 3FI aliased
Main effects and 2-factor interactions are both estimable. Preferred for optimisation follow-up.

Half-Fraction: 2⁷⁻⁴ Screening Design — 8 Runs for 7 Factors

A plastics injection moulding team suspects 7 process variables affect warpage. A full 2⁷ requires 128 runs — weeks of production time. A 2⁷⁻⁴ Resolution IV design needs only 8 runs.

FactorLabelLow (−1)High (+1)
Melt TemperatureA220°C260°C
Injection SpeedB60 mm/s100 mm/s
Hold PressureC40 MPa80 MPa
Hold TimeD5 s15 s
Cooling TimeE10 s25 s
Gate SizeFSmallLarge
Mould TempG30°C60°C

The 2⁷⁻⁴ design uses a base 2³ design in A, B, C — then assigns D=AB, E=AC, F=BC, G=ABC. This gives Resolution IV: all main effects are free of two-factor interactions.

RunABCD=ABE=ACF=BCG=ABCWarpage (mm)
1−1−1−1+1+1+1−10.42
2+1−1−1−1−1+1+10.61
3−1+1−1−1+1−1+10.38
4+1+1−1+1−1−1−10.55
5−1−1+1+1−1−1+10.47
6+1−1+1−1+1−1−10.58
7−1+1+1−1−1+1−10.44
8+1+1+1+1+1+1+10.72

Calculating Main Effect Estimates

Each main effect = (average of high runs − average of low runs). For factor A (Melt Temperature):

Effect A = ½[(0.61+0.55+0.58+0.72) − (0.42+0.38+0.47+0.44)]
Effect A = ½[2.46 − 1.71] = ½[0.75] = +0.188
Melt temperature at high level increases warpage by ~0.19 mm on average.
FactorEffect EstimateAbs. EffectVerdict
A — Melt Temperature+0.1880.188★ Active
B — Injection Speed+0.0230.023Inert
C — Hold Pressure−0.0530.053Inert
D — Hold Time+0.1480.148★ Active
E — Cooling Time−0.1180.118Marginal
F — Gate Size+0.0180.018Inert
G — Mould Temp+0.0330.033Inert
Screening outcome: A (Melt Temperature) and D (Hold Time) are clearly active. E (Cooling Time) is marginal and worth including in follow-up. B, C, F, G can be held at convenient settings. The team now runs a focused 2³ optimisation study on A, D, and E — just 8 runs instead of the original 128.

Plackett-Burman Designs

Plackett-Burman (PB) designs are Resolution III screening designs that study up to N−1 factors in N runs, where N is a multiple of 4 (12, 20, 24, 28…). They are more economical than fractional factorials for large factor counts but have complex aliasing — every main effect is partially aliased with every 2-factor interaction not involving that factor.

DesignRunsMax FactorsResolutionBest Used For
PB-121211IIIRapid screening; 2FI negligible assumption
PB-202019IIILarge screening studies
2⁴⁻¹84IV4-factor screening; 2FI estimable with follow-up
2⁵⁻²85III5-factor screening; main effects only
2⁶⁻²166IV6-factor study; cleaner aliasing than PB
2⁷⁻³167IV7-factor screening with good resolution
2⁷⁻⁴87IVMaximum economy; 7 factors in 8 runs

Design Selection Decision Guide

Choose Fractional Factorial when…
  • ▸ You want clean, interpretable aliasing
  • ▸ You may need Resolution IV or V
  • ▸ Factor count is modest (4–8 factors)
  • ▸ You anticipate a follow-up optimisation study
Choose Plackett-Burman when…
  • ▸ You have 9–19 factors to screen
  • ▸ Resources are very limited
  • ▸ 2-factor interactions are expected to be small
  • ▸ You only need to identify the vital few factors
Practitioner rules of thumb
  • Always randomise run order to protect against lurking time trends.
  • Add 2–4 centre points to check for curvature without inflating run count.
  • Use a half-normal plot to visually separate active effects from noise.
  • If a 2FI is important, upgrade to Resolution V or run a follow-up fold-over.
  • Screen first, optimise second — never skip directly to RSM on 8+ factors.

Taguchi Methods

Genichi Taguchi developed a system for improving quality by designing processes that are robust — insensitive to noise factors like temperature drift, humidity, and raw material variation. His philosophy: it is cheaper to design robustness in than to control every noise factor in production.

🎯 Taguchi's View of a Process — Signal, Noise, and Response
Signal Factors Controllable inputs set by the engineer Noise Factors Uncontrollable: temp, humidity, material lot PROCESS Control factors set by engineer RESPONSE (Y) Target: On-target + Low variation Goal: Make Y insensitive to noise by choosing the right control factor levels

Signal-to-Noise (S/N) Ratios

The S/N ratio is the primary Taguchi response metric. A higher S/N = a more robust product/process. The formula depends on the optimization objective.

ObjectiveS/N FormulaUse WhenExample
Smaller is BetterS/N = −10 log(Σy²/n)Defects, contamination, noise, corrosion, errorMinimise weight loss in corrosion test
Larger is BetterS/N = −10 log(Σ(1/y²)/n)Strength, yield, throughput, efficiencyMaximise bond strength, chemical yield
Nominal is BestS/N = 10 log(ȳ²/s²)Dimensional tolerances, target valuesHit target wall thickness of 3.0mm ± 0.1
Ordered CategoricalS/N based on scoresAttribute data with ranked categoriesDefect severity: none / minor / major / critical

Taguchi Orthogonal Arrays

Taguchi developed standardised balanced designs called orthogonal arrays (L4, L8, L9, L12, L16…). The notation L₈(2⁷) means: 8 runs, up to 7 factors, each at 2 levels.

ArrayRunsMax FactorsLevelsHas Interaction Table?Best Use
L4432YesQuick 3-factor screen; plastic sealing example
L8872Yes (27) / No (14×24)Standard 2-level screening; steel heat-treat example
L9943No3-level factors; plastic processing with 4 factors
L1212112NoMain effects only; interactions roughly distributed to all columns
L1616152YesLarge 2-level screening

Accuracy vs Precision — Taguchi's Starting Point

🎯 Four Combinations of Accuracy and Precision
Accurate + Precise ✓ Precise, Not Accurate (fix with adjustment) Accurate, Not Precise (reduce variation via DOE) Neither — Worst Case ✗ (both DOE + adjustment needed)

Mixture Designs

Mixture designs are used when the factors are components of a mixture that must sum to a constant (typically 100% or 1.0). The response depends on the proportions of ingredients, not their absolute amounts. Common in chemicals, food, pharmaceuticals, and polymer formulation.

⚠️

Key constraint: x₁ + x₂ + x₃ + … = 1. Because of this constraint, standard factorial designs cannot be used directly — you cannot independently vary all components. The feasible experimental region is a simplex (triangle in 3D, tetrahedron in 4D).

🔺 Three-Component Mixture Design — The Simplex
A = 1.0, B = 0, C = 0 A=0, B=1.0, C=0 A=0, B=0, C=1.0 0.5A 0.5B 0C 0.5A 0B 0.5C 0A 0.5B 0.5C Centroid ⅓A ⅓B ⅓C Simplex with quadratic + cubic design points
Design TypePoints IncludedModel FittedUse When
Simplex DesignVertices only (pure components)LinearFirst screening — assume no blend effects
Simplex CentroidVertices + midpoints + centroidQuadratic / CubicWhen blend synergism or antagonism is likely
Simplex LatticeEvenly spaced grid across simplexPolynomial (degree q)Space-filling coverage; complex response surfaces
Extreme VerticesConstrained vertices + centroidQuadratic / special cubicWhen components have upper/lower bounds (real formulations)
💡

Blown Film Example: A polymer film is made from three components (A, B, C) that must total 100%. The team runs a three-component quadratic simplex design and measures tensile strength. The model identifies the optimal blend ratio that maximises strength — something impossible to find with OFAT or standard factorial designs.

DOE Quick Reference — Exam Summary

Design Selection Guide

🗺️ Which Design Should I Use?
How many factors? And do you need to detect interactions? 2–4 factors AND you need all interactions → Full Factorial 2ᵏ 4 factors = 16 runs (2⁴) or 8 runs with replication (2³) 5–8 factors, some interactions → Fractional Factorial (Res V or higher) 25⁻¹ = 16 runs (Res V). Identifies all main effects + 2FIs 7+ factors, main effects only → Plackett-Burman Screening (Res III) 12 runs → 11 factors. Identify the vital few for follow-up studies. Need robustness against noise? → Taguchi Orthogonal Array (L4, L8, L9, L12…)

Key Formulas at a Glance

QuantityFormulaNotes
Main Effect E(A)E(A) = Ȳ(A+) − Ȳ(A−)Average response at high level minus average at low level
Std dev of experimentsₑ = √(Σs²/k)k = number of runs; s² = variance per run
Std dev of effectssEff = sₑ × √(4/n)n = total number of trials
Degrees of freedomdf = (obs/run − 1) × runsIf obs/run − 1 = 0, use multiplier of 1
Decision limitDL = t(α/2, df) × sEffEffects outside ±DL are statistically significant
F-test (variances)F = s²_larger / s²_smallerLarger variance always in numerator → one-tail test
Nonlinearity effectE(NL) = Ȳ_center − Ȳ_grandSignificant → linear model invalid; need ≥3 levels
ResidualRes = Y_observed − Y_predictedUsed for residual analysis in unreplicated designs

Common Pitfalls to Avoid

TrapCorrect Understanding
Repeat vs ReplicationRepeat = same conditions, no new setup (does NOT estimate experimental error). Replication = independent new setup (DOES estimate error).
When interaction is significantThe interaction plot is MORE important than the main effect plots. Main effects describe averages; the interaction describes the joint effect.
Hierarchy ruleIf an interaction AB is significant, include both main effects A and B in the model — even if A or B alone are not significant.
Significant nonlinearityIf center points show significant nonlinearity, the linear model is invalid and you cannot interpolate. Must repeat with ≥3 levels.
Variation vs Mean analysisA factor can be insignificant for the mean but critically important for reducing variation. Always run both analyses.
Resolution III designsMain effects are aliased with 2-factor interactions. You can identify which factors matter, but you cannot separate main effects from interactions.
Randomisation purposeRandomisation protects against unknown time-related trends. It is the "insurance policy" — not optional.
OFAT advantage claimedOFAT CANNOT detect interactions between factors. This is a fundamental limitation, not a minor one. DOE is always better when interactions are possible.
Factor C (ramp time) in yield exampleC was not significant for the mean, but was critical for reducing variation. The "diamond factor" — rare and extremely valuable.

DOE Procedural Checklist (10 Practical Rules)

01 Define the objective (maximise, minimise, hit target, reduce variation) before running any experiment
02 Complete MSA before DOE — a bad gauge makes a capable process look incapable
03 Stabilise the process with SPC before running DOE — special causes inflate experimental error
04 Set factor levels boldly in screening — wide spacing makes significant factors easier to detect
05 Always randomise trial order — it is the insurance policy against unknown external influences
06 Run centre points to check for nonlinearity in quantitative factors (at least 4 centre points)
07 Verify the model prediction at the recommended conditions before implementing process changes
08 Non-significant factors are set on the basis of cost, productivity, or convenience only
09 Execute a line clearance before and after DOE to prevent mix-ups or comingling of products
10 Report conclusions in plain language — your audience understands Pareto charts, not t-statistics
Design for Six Sigma · From Concept to Commercial Success

Design for Six Sigma (DFSS)

DFSS is not an improvement methodology — it is a design methodology. Where DMAIC fixes a broken process, DFSS builds the right process from scratch. Used when you are creating something new: a product, a service, a manufacturing line. The goal is to design quality in, not inspect it out.

What is DFSS — and when do you use it?

DFSS answers one question: "How do we build a product that is right first time, every time, at the right cost?" It is not a repair kit. It is a design philosophy applied before a single part is cut.

✓ Use DFSS when...
  • Designing a completely new product or service
  • Existing process cannot meet new requirements
  • Entering a new market or technology domain
  • Customer requirements are not yet fully understood
  • Target sigma level is ≥ 4.5σ from the start
⚠ Use DMAIC instead when...
  • An existing process is underperforming
  • Root cause is unknown but process exists
  • Incremental improvement is the goal
  • Product design is already locked
  • Defect rate needs reduction in current production

The Four DFSS Methodologies — Side by Side

DFSS is not one framework — it is a family. Different industries and organisations use different variants. All share the same core philosophy.

MethodologyPhasesBest ForOrigin
DMADVDefine · Measure · Analyse · Design · VerifyNew product or process design — the most widely taughtGE, Motorola
IDOVIdentify · Design · Optimise · ValidateHardware-heavy design; aerospace, automotiveSix Sigma Academy
DMADOVDefine · Measure · Analyse · Design · Optimise · VerifyComplex multi-stage designs needing explicit optimisation loopHoneywell
CDOVConcept · Design · Optimise · VerifyProduct platform design, systems engineeringCreveling
💡

Which should you use? DMADV is the best starting point — it maps cleanly to the Six Sigma belt structure, has the richest toolset documentation, and is recognised across industries. This module teaches DMADV throughout, with notes on where the others differ.

DFSS vs DMAIC — The Core Difference

DMAIC
Fix what exists

You have a process. It is producing defects. You investigate, find root causes, implement solutions. Improvement happens on an existing platform.

vs
DMADV / DFSS
Build what doesn't exist yet

You have a customer need. Nothing exists yet. You translate that need into requirements, generate concepts, select and optimise the best one, then validate it meets the requirements.

📌

The 70% rule: It is widely cited that 70–80% of a product's quality and cost is determined at the design stage. DFSS is the methodology that addresses this window — before tooling is cut, before supply chains are locked, before the cost of change becomes prohibitive.

The DMADV Roadmap — Phase by Phase

Each phase of DMADV has a clear deliverable, a gate review question, and a defined set of tools. You cannot progress to the next phase without answering the gate question. This is what keeps DFSS honest.

D
01
Define
Gate: Is this the right project?

What you do: Establish project scope, business case, customer segments, and high-level requirements. Define what success looks like in measurable terms.

Key tools: Project charter · SIPOC · VOC (interviews, surveys) · Kano model · Business case with ROI
M
02
Measure
Gate: Do we understand customer needs?

What you do: Translate Voice of Customer into Critical to Quality (CTQ) characteristics. Benchmark competitors. Establish target performance levels with measurable specifications.

Key tools: CTQ tree · QFD House of Quality · Competitive benchmarking · Target specification table · Kano classification
A
03
Analyse
Gate: Have we selected the best concept?

What you do: Generate multiple design concepts. Use structured methods to evaluate and select the best. Identify critical design parameters and their relationships to CTQs.

Key tools: Pugh concept selection · TRIZ · Morphological chart · Design FMEA (risk identification) · Transfer function mapping
D
04
Design
Gate: Does the design meet targets?

What you do: Develop the detailed design. Run DOE to optimise critical parameters. Apply tolerance design and Design for Manufacture/Assembly (DFM/DFA). Predict capability.

Key tools: DOE (factorial, RSM) · Taguchi robustness · Tolerance stack-up · Monte Carlo simulation · DFM/DFA · Predicted Cpk
V
05
Verify
Gate: Is it ready for full production?

What you do: Validate the design against customer requirements using prototypes and pilot runs. Confirm predicted capability with real data. Hand off to production with full control plan.

Key tools: Pilot run capability study · MSA · Control plan · PFMEA · Design validation testing · Ppk confirmation

Voice of Customer — From Feedback to Specification

VOC is the most underinvested step in most organisations. Teams rush to design solutions before truly understanding the problem. DFSS forces you to slow down here — because every hour spent understanding customers saves ten hours of redesign later.

Step 1 — Gather VOC Data

Direct Methods
  • Customer interviews (structured)
  • Focus groups
  • Field observation (Gemba)
  • Prototype feedback sessions
Indirect Methods
  • Warranty & complaint data
  • Online reviews mining
  • Sales team feedback
  • Regulatory requirements
Competitive Intel
  • Teardown analysis
  • Patent landscape
  • Benchmarking studies
  • Industry standards review

Step 2 — Kano Model: Not All Requirements Are Equal

The Kano model sorts customer requirements into three categories. Knowing which category each requirement falls into prevents over-engineering the basics and missing the delighters.

⚠️
Must-Be

Expected basics. Their presence doesn't delight — their absence causes immediate rejection. Example: a car must start reliably.

📈
Performance

More is better. Directly proportional to satisfaction. Example: fuel economy — customers always want more.

Delighter

Not expected, but creates strong positive reaction. Example: automatic parking — customers didn't ask, but love it.

Step 3 — CTQ Tree: Translate Words into Numbers

A CTQ tree converts vague customer language into specific, measurable engineering requirements. Each branch goes from customer need → driver → specification.

Example: Medical Infusion Pump
Customer Need
"I need to know the pump is working correctly"
Driver
Alarm reliability
CTQ Specification
Alarm response ≤ 2 seconds, 100% of the time
"I need it to be easy to carry"
Portability
Weight ≤ 800 g, handle grip force ≤ 15 N

Step 4 — QFD: Linking Customer Needs to Design Parameters

Quality Function Deployment (QFD) — also called the House of Quality — ensures every engineering decision can be traced back to a customer requirement. It prevents the classic trap of designing what is technically elegant rather than what is actually needed.

Customer NeedImportance (1–5)Design ParameterRelationshipTarget
Light weight⭐⭐⭐⭐⭐ 5Enclosure material densityStrong (9)≤ 1.5 g/cm³
Accurate dosing⭐⭐⭐⭐⭐ 5Pump mechanism toleranceStrong (9)±0.5% dose accuracy
Long battery life⭐⭐⭐⭐ 4Motor efficiencyMedium (3)≥ 72 hr at standard rate
Alarm is audible⭐⭐⭐⭐ 4Speaker output powerStrong (9)≥ 75 dB at 1 m

Concept Design — Generating and Selecting the Best Idea

This is where most engineers spend too little time. The quality of your final design is bounded by the quality of your concept space. If you evaluate only one concept, you are not designing — you are just executing an assumption.

Morphological Chart — Systematic Concept Generation

A morphological chart forces you to decompose the design problem into independent sub-functions and generate alternatives for each. Combining one option from each row creates a unique concept.

Sub-functionOption AOption BOption C
Power sourceRechargeable Li-ionDisposable alkalineMains powered
Pump mechanismPeristalticSyringe driverRotary gear
Display typeLCD numericOLED graphicLED indicator only
AlarmAudible buzzerVibration + audibleWireless to receiver
Housing materialABS plasticPolycarbonateAluminium alloy

The above chart yields 3⁵ = 243 possible concepts. You don't evaluate all of them — you use engineering judgment to select 3–5 promising combinations for formal comparison.

Pugh Concept Selection — Structured Comparison Against a Datum

The Pugh matrix evaluates concepts against criteria using a datum (reference concept, often the current design or market leader). Scores: + (better), (worse), S (same).

CriterionWeightDatum (Concept A)Concept BConcept CConcept D
Weight5D+S+
Battery life4DS+
Dose accuracy5D+S+
Alarm clarity4DS+S
Manufacturability3DS+
Weighted score0+14+13+11
💡

The Pugh matrix does not give you the answer — it structures your thinking. Concept B scores highest, but notice its manufacturability weakness. The right response is not to blindly select B, but to ask: "Can we redesign B to address manufacturability while keeping its weight and accuracy advantages?"

Transfer Functions — Linking Design to CTQ

A transfer function is a mathematical relationship: CTQ = f(design parameters). You must establish this before running experiments. Without it, you cannot predict the effect of design changes.

Example: Pump dose accuracy
Dose Volume = (Motor speed × Stroke length × Cross-section area) / Mechanical efficiency
Y = f(RPM, L, A, η)
Each parameter becomes a factor in the DOE. The transfer function tells you which factors matter most.

Design Optimisation — Finding the Best Settings

Once you have a chosen concept and transfer functions, you optimise. This means running designed experiments to find the factor settings that simultaneously maximise performance and minimise sensitivity to variation.

The Two-Step Optimisation Strategy (Taguchi)

Step 1 — Minimise Variation

Find the factor settings that make the output least sensitive to noise (uncontrollable variation). Use Signal-to-Noise ratio as the optimisation metric. Fix these settings first.

Step 2 — Hit the Target

With variation minimised, use a scaling factor (a factor that affects mean but not variance) to move the mean to the target. This preserves the robustness gained in Step 1.

Signal-to-Noise Ratios — Choosing the Right One

CharacteristicS/N FormulaWhen to useExample
Smaller-the-Better−10·log(Σy²/n)Defect counts, vibration, shrinkage — zero is idealDimensional deviation, leakage rate
Larger-the-Better−10·log(Σ1/y²/n)Strength, yield, life — more is always betterTensile strength, battery life
Nominal-the-Best10·log(µ²/σ²)Target value with symmetric toleranceShaft diameter, fill volume, resistance

Response Surface Methodology (RSM)

When factors are continuous and you need to find an optimal point (not just compare levels), RSM maps the response across the design space. It answers: "At exactly what values of A and B is Y maximised?"

Central Composite Design (CCD)

2ᵏ factorial + star points (±α) + centre points. Fits a full quadratic model. Best for 2–5 continuous factors. Rotatable — equal prediction variance at equal distance from centre.

Box-Behnken Design (BBD)

Midpoints of cube edges + centre points. Never tests extreme corners — safer when extreme combinations are physically dangerous or impossible. Fewer runs than CCD for k ≥ 3.

📌

The RSM optimum is not the same as "maximise the CTQ." You optimise Value minus Cost. A material that gives 5% better strength but costs 40% more may not be the right choice. Always include cost in the optimisation objective.

Tolerance Design and Variation Management

Tolerances are not free. Too tight — manufacturing cost explodes. Too loose — the product fails in the field. Tolerance design finds the optimal balance using statistical methods rather than engineering gut feel.

Tolerance Stack-Up Analysis

When multiple components assemble together, their individual dimensional variations combine. The question is: what is the probability the assembly falls within its specification?

Worst-Case Method
T_assembly = Σ|Tᵢ|

Guarantees 100% of assemblies work, but assumes all parts are at their worst-case limits simultaneously. Very conservative — drives unnecessarily tight component tolerances.

Statistical (RSS) Method
σ_assembly = √(Σσᵢ²)

Accounts for the fact that all parts being at worst-case simultaneously is extremely unlikely. Allows looser component tolerances for the same assembly yield. Requires knowledge of σᵢ per component.

Propagation of Variance — The Design Engineer's Formula

If the CTQ (Y) is a function of multiple input variables (X₁, X₂, ...), how does variation in the inputs propagate to variation in Y?

σ²_Y ≈ Σᵢ (∂Y/∂Xᵢ)² · σ²_Xᵢ

The partial derivative (∂Y/∂Xᵢ) is the sensitivity coefficient — how much Y changes per unit change in Xᵢ. Squared and multiplied by the variance of Xᵢ.

💡

Practical insight: The sensitivity coefficient squared means that the dominant source of variation in Y is often one or two inputs with high sensitivity — not all inputs equally. Focus tolerance investment on the highest-sensitivity parameters.

Monte Carlo Simulation for Tolerance Verification

When the transfer function is complex or non-linear, analytical propagation is difficult. Monte Carlo simulation draws random values from each input distribution, computes Y, and builds up a Y distribution from thousands of trials.

5 Steps
  1. Define distributions for each input (X₁, X₂, ...) — mean and std dev from capability data
  2. Randomly sample one value from each input distribution
  3. Compute Y using the transfer function
  4. Record the Y value. Repeat 10,000+ times.
  5. The resulting Y distribution gives you predicted Cpk, % out-of-spec, and percentiles
📌

Monte Carlo answers the question your tolerance stack-up cannot: "What is the actual predicted yield of this assembly design, given real component capability data?" Use it before committing to tooling.

Verify — Confirming the Design Works in the Real World

Verification is not the last step — it is the proof that all previous steps were done correctly. A strong Verify phase should produce no surprises. If it does, it means the Analyse or Design phases were incomplete.

Verification vs Validation — Know the Difference

Verification

"Did we build it right?"

Confirms the design meets its specifications. Compares actual measurements to design targets. Typically done on prototypes and pre-production units.

Validation

"Did we build the right thing?"

Confirms the design meets customer needs in real use conditions. Typically done with real users in real environments. Answers the VOC question from Phase 1.

Capability Confirmation — The Ppk Requirement

The pilot run is your first real capability data. Minimum requirement: Ppk ≥ 1.67 for new designs going to production (some industries require ≥ 2.00). Calculate Ppk — not Cpk — because Ppk includes all sources of long-term variation.

IndexFormulaWhat it tells youTarget
Cp(USL−LSL)/(6σ̂)Potential: does the spec window fit the process?≥ 2.00 for new design
Cpkmin(Cpu, Cpl)Short-term actual: centred and capable?≥ 1.67 for new design
Ppkmin(Ppu, Ppl) using s_totalLong-term actual: including all drift and shifts≥ 1.33 in production

Design Scorecard — Closing the Loop

Every CTQ identified in Measure must be verified in this phase. The design scorecard maps each requirement to its measured result.

CTQTargetToleranceMeasuredPpkStatus
Dose accuracy0% deviation±0.5%±0.31%1.82✓ Pass
Weight750 g≤ 800 g763 g✓ Pass
Alarm response1.2 s≤ 2.0 s1.4 s2.1✓ Pass
Battery life80 hr≥ 72 hr77 hr⚠ Monitor

DFSS Toolbox — When to Use What

PhaseToolPurposeOutput
DefineProject CharterScope, timeline, team, business caseSigned charter document
SIPOCHigh-level process mapScope boundaries
VOC methodsCapture customer language before interpreting itRaw VOC statements
MeasureKano modelClassify requirements by typeKano chart
CTQ treeTranslate VOC to measurable specsCTQ specifications with LSL/USL
QFD / House of QualityLink customer needs to engineering parametersPrioritised design parameters
AnalyseMorphological chartSystematic concept generationConcept alternatives
Pugh matrixStructured concept selectionWinning concept with rationale
Design FMEAIdentify design failure risks earlyRisk register + mitigation actions
DesignScreening DOEIdentify the vital few factorsSignificant factors list
Taguchi / Robust designMinimise sensitivity to noiseRobust parameter settings
RSM / CCDFind optimal factor settingsContour plots, optimal point
Tolerance designAllocate tolerances statisticallyComponent tolerance targets
VerifyPilot run Ppk studyConfirm capability in productionPpk ≥ 1.33
MSA / GR&RConfirm measurement system is adequate%GR&R ≤ 10%
Design scorecardClose the loop on every CTQPass/fail per requirement

Full Project Walkthrough: Designing a Smart Water Meter

Follow one product through the complete DMADV process — from customer complaint to production-ready design. This is the kind of project a Black Belt would lead over 6–9 months.

The Brief

A utility company wants to replace 500,000 mechanical water meters with smart digital meters over 5 years. Current meters have a 12% annual replacement rate due to reading errors, jamming, and battery failure. The project team must design a new smart meter that customers trust and engineers can manufacture to ≥ 4.5σ.

DEFINE

Business case: 12% replacement rate × 500,000 meters × £85/replacement = £5.1M/year avoidable cost. Reducing to 2% saves £4.1M/year. Project charter signed. Team: 1 Black Belt, 2 Green Belts, design engineer, manufacturing engineer, customer service lead.

Scope
New meter design only — no installation process
Timeline
9 months to pilot, 18 months to full launch
Target
Annual replacement rate ≤ 2% within 3 years
MEASURE

VOC gathered from 80 interviews (householders, plumbers, meter readers, utility managers). Top themes:

Customer VoiceKano TypeCTQ Specification
"I need to trust the reading is accurate"Must-BeReading accuracy ±0.5% of actual volume
"It should last without maintenance"Must-BeBattery life ≥ 10 years at standard transmission rate
"I want to see my usage on my phone"PerformanceData transmission ≤ 15 min latency, 99.5% uptime
"No leaks around the meter body"Must-BeIP68 rated — 1 m immersion for 30 min, zero leakage
"Easy to read without bending down"DelighterRemote reading via app — no physical access needed
ANALYSE

Three concepts generated from morphological chart, then evaluated in Pugh matrix:

ConceptFlow sensorCommsBatteryPugh score
A — Ultrasonic (datum)UltrasonicLoRaWANLi-thionyl0 (datum)
B — MagneticMagneticNB-IoTLi-thionyl−7
C — Ultrasonic + NB-IoTUltrasonicNB-IoTLi-SOCl₂+18 ✓ Selected

Key insight from DFMEA: Ultrasonic transducer bond failure identified as top risk (RPN 280). Mitigation: change from adhesive bond to mechanical clamp with O-ring seal. RPN reduced to 48 after redesign.

DESIGN

DOE results: L9 Taguchi OA run on 4 factors (transducer gap, signal frequency, temperature compensation algorithm, housing wall thickness). Two CTQs measured: reading accuracy and signal strength.

FactorEffect on AccuracyEffect on SignalOptimal Setting
Transducer gapSignificant ✓Not significant8.5 mm ± 0.2 mm
Signal frequencySignificant ✓Significant ✓1.0 MHz
Temp. compensationSignificant ✓Not significantAlgorithm v3 (quadratic)
Wall thicknessNot significantSignificant ✓3.5 mm (min weight)

Tolerance design: Monte Carlo simulation (10,000 runs) with production Cpk data from transducer supplier predicts assembly accuracy Ppk = 1.87 — exceeding the 1.67 target. Transducer gap tolerance tightened from ±0.5 mm to ±0.2 mm based on sensitivity analysis.

VERIFY

Pilot run: 200 units manufactured at supplier. Full measurement on all CTQs.

CTQTargetPilot ResultPpkStatus
Reading accuracy±0.5%±0.28% avg1.93✓ Pass
Battery life (projected)≥ 10 yr12.3 yr (accelerated test)✓ Pass
Transmission latency≤ 15 min4.2 min avg✓ Pass
IP68 seal integrityZero failures0/200 failures✓ Pass

Project outcome: Design approved for full production. Projected annual replacement rate: 1.8% — below the 2% target. Estimated annual saving vs current state: £4.3M. Full deployment over 5 years. DFSS project closed.

DFSS Quick Reference

PhaseGate QuestionKey DeliverableCommon Mistake
DefineIs this the right problem?Signed project charterScope too broad — fix the scope first
MeasureDo we understand the customer?CTQ specifications with LSL/USLGoing straight to solutions before completing VOC
AnalyseIs this the best concept?Selected concept with rationaleEvaluating only one concept — not a selection
DesignDoes the design meet targets?Optimised design with predicted CpkOptimising mean without addressing variation
VerifyIs it ready for production?Ppk ≥ 1.33 on all CTQsVerifying on prototype, not production tooling

10 Rules That Separate Good DFSS from Bad DFSS

  1. VOC before solutions. You cannot design the right thing if you haven't confirmed what "right" means to the customer.
  2. Measurable CTQs only. "Reliable" is not a CTQ. "Zero failures in 10 years at 95% confidence" is.
  3. At least 3 concepts. One concept is not a selection — it is an assumption with extra steps.
  4. Transfer functions before experiments. Know what you are testing and why before running a single trial.
  5. Optimise variation before mean. A process on target with high variance will drift off target. A robust process stays on target.
  6. Tolerance design is not the last step. Do it during Design, not after all decisions are made.
  7. Ppk, not Cpk, for verification. Cpk is a short-term study. Production will never be as controlled as a capability study.
  8. Design FMEA before prototype. Find failure modes on paper, not in the field.
  9. Gate reviews are not approval ceremonies. Each gate question must have a data-backed answer — not a slide.
  10. DFSS ends at design handoff, not project close. Track production Ppk for 3 months post-launch to confirm predictions.

Advanced: Strategic Experimentation & Value Engineering

This section covers H.E. Cook's DFSS as Strategic Experimentation (SE) approach — a powerful extension that translates experimental results directly into financial projections. Used by teams who need to connect engineering decisions to boardroom metrics: price, market share, and cash flow.

The Three Fundamental Metrics

Cook's insight: in any competitive market, three conditions are always true about your current product. Use them as your strategic compass.

📉
VALUE is too LOW
V(g) — customer willingness to pay

Improve attributes customers actually value

💸
COST is too HIGH
C — variable cost per unit

Reduce variable cost through design choices

🐢
INNOVATION is too SLOW
1/δt — product introduction rate

Compress development cycle with DFSS

Universal Competitive Metric (Cook)
U ≡ (V − C) / δt

Your U must be ≥ your best competitor's U. Improve value, reduce cost, and speed up innovation simultaneously.

Value Curves — Quantifying What Customers Will Pay

The value curve V(g) answers: "If we improve this attribute by X%, how much more will the customer pay?" This converts engineering decisions into price and demand projections.

NIB — Nominal is Best

Interior dimensions, shaft diameter. Ideal is a specific target value. Value decreases if too high or too low.

SIB — Smaller is Better

Defects, vibration, noise. Ideal is zero. V(0) = maximum. Value decreases monotonically.

LIB — Larger is Better

Fuel economy, battery life, strength. Ideal is infinity. V increases with attribute — diminishing returns.

From Experiment to Cash Flow — The Lambda Framework

Lambda (λ) coefficients connect experimental results to financial outcomes. Each λ tells you: "What is the projected change in value, cost, or cash flow if this factor changes from baseline to its experimental level?"

The core formula
λ̂ = XS · Y    where    XS = [X'X]⁻¹ · X'

X is the design matrix, Y is the vector of experimental outcomes, λ̂ gives the projected effect of each factor on each strategic outcome

💡

The full SE methodology — including Monte Carlo cash-flow simulation, Cournot-Bertrand pricing, and the DV survey method — is mathematically rigorous and beyond most DFSS projects. It is most valuable in oligopoly markets where small value improvements translate to large market share shifts. Reference: H.E. Cook, Design for Six Sigma as Strategic Experimentation (ASQ Quality Press).

Defense Quality Standards

Military & Defense Quality Standards

Key U.S. military and NATO defense quality standards — with full coverage of MIL-STD-1916 (DoD Preferred Methods for Acceptance of Product), including all sampling tables, worked examples, and switching rules.

MIL-STD-1916 — DoD Preferred Methods for Acceptance of Product

Published 1 April 1996. The fundamental philosophy shift: away from AQL-based detection (sampling to find defects) toward prevention-based quality systems (SPC, process control, continuous improvement).

💡

The core philosophy (Foreword §7): "Contractors are responsible for establishing their own manufacturing and process controls. Contractors are expected to use recognized prevention practices such as process controls and statistical techniques." Sampling inspection alone does not control or improve quality — it is redundant when effective process controls exist.

Two Acceptance Paths

Path A — Contractor-Proposed

Submit a prevention-based quality system as alternate to sampling. Must demonstrate:

  • Documented quality system plan
  • Process focus (SPC, FMEA, PDCA evidence)
  • Objective evidence of effectiveness
  • Cpk: Critical≥2.00, Major≥1.33, Minor≥1.00
Path B — Acceptance by Tables

Use the prescribed sampling plans indexed by Verification Level and Code Letter. Three plan types:

  • Table II — Attributes (lot/batch)
  • Table III — Variables (lot/batch)
  • Table IV — Continuous attributes

Verification Levels (VL-I through VL-VII)

VL prescribes the level of significance of a characteristic. VL-VII = highest effort (most critical), VL-I = lowest. Specified in the contract or product specifications.

VLSignificanceAttributes n (CL-A lot)Variables n (CL-A lot)
VII (Tightened T)Highest / Critical3072113
VIICritical128087
Very high51264
High19244
IVModerate8029
IIIStandard3218
IIBelow standard129
I (Reduced R)Minimum54

Critical Characteristic Requirements (§4.4)

For each critical characteristic, the contractor MUST implement an automated screening or fail-safe manufacturing operation AND apply sampling plan VL-VII to verify performance. When a critical nonconformance is found at any phase:

  • Immediately prevent delivery to Government
  • Notify Government representative
  • Identify the cause
  • Take corrective action
  • Screen ALL available units
🚨

Zero tolerance on critical characteristics. No AQL exists for critical characteristics in MIL-STD-1916 — the acceptance criterion is zero nonconformances, reinforced by automated screening.

📋 Key Definitions (§3)

  • Critical Characteristic

    Must be met to avoid hazardous conditions OR to assure tactical function of major systems (aircraft, tank, missile).

  • Major Characteristic

    Must be met to avoid failure or material reduction of usability. One step below critical.

  • Minor Characteristic

    Departure not likely to reduce usability materially. Least stringent.

  • Verification Level (VL)

    VL-VII = highest sampling effort. VL-I = lowest. Set by contract.

  • Production Interval

    Period of continuous sampling assumed homogeneous quality. Normally a single shift, max one day.

  • Cpk Thresholds (§4.1.2b)

    Critical: ≥2.00   Major: ≥1.33   Minor: ≥1.00 — required for alternate acceptance method.

💡

New to acceptance sampling? The Sampling Theory tab explains OC curves, AQL, RQL, producer/consumer risk, and the mathematics behind these tables — read it first for the full picture.

MIL-STD-1916 Sampling Tables

Three matched plan types — all indexed by VL and Code Letter. The Code Letter (CL) is determined from lot size using Table I.

Table I — Code Letters by Lot Size and VL

Lot SizeVL-VIIVL-VIVL-VVL-IVVL-IIIVL-IIVL-I
2–170AAAAAAA
171–288AAAAAAB
289–544AAAAABC
545–960AAAABCD
961–1,632AAABCDE
1,633–3,072AABCDEE
3,073–5,440ABCDEEE
5,441–9,216BCDEEEE
9,217–17,408CDEEEEE
17,409–30,720DEEEEEE
30,721+EEEEEEE

Table II — Attributes Sampling (Zero Acceptance)

Acceptance criterion: zero nonconformances in the sample. If any found → reject lot.

CLT (Tightened)VIIVIVIVIIIIIIR (Reduced)
A3072128051219280321253
B4096153664025696401663
C51202048768320128482083
D6144256010243841606424104
E8192307212805121928032125

Table III — Variables Sampling (k and F Criteria)

CLTVIIVIVIVIIIIIIR
Sample sizes (nv)
A1138764442918942
B12292694932201152
C129100745437231372
D136107815841261583
E145113876444291894
k values (one- or two-sided)
A3.513.273.002.692.402.051.641.211.20
E3.763.513.273.002.692.402.051.641.21
F values (two-sided double spec only)
A.136.145.157.174.193.222.271.370.707
E.128.136.145.157.174.193.222.271.370

Variables Acceptance Criteria (§5.2.2.2.3)

Single-sided spec — k criterion
(x̄ − LSL) / s ≥ k
(USL − x̄) / s ≥ k
k values from MIL-STD-414 / ANSI Z1.9 table
Double-sided spec — Form 1
QL = (x̄ − L) / s
QU = (U − x̄) / s
Both QL and QU must meet k. Accept if p̂ ≤ AQL.

Switching Rules — Normal / Tightened / Reduced

Inspection intensity is not fixed — it responds to demonstrated supplier quality history. Good history earns reduced sampling. Poor performance triggers tightened inspection.

📊 MIL-STD-1916 Inspection Switching Flow
TIGHTENED VL shifted left larger samples NORMAL Starting point Contract VL applies directly REDUCED VL shifted right smaller samples 2 of 5 lots fail 5 consec. lots pass + cause corrected 10 consec. pass + Govt approval Any lot rejected OR irregular production DISCONTINUATION: Remains tightened too long → Government may halt acceptance

Switching Rules — Detailed Criteria

TransitionTrigger (Lot/Batch)Additional Requirement
Normal → Tightened2 lots withheld within last 5 lots
Tightened → Normal5 consecutive lots acceptedCause for nonconformances corrected
Normal → Reduced10 consecutive lots acceptedSteady production rate + Govt. approval
Reduced → NormalAny 1 lot withheldOR: irregular production, unsatisfactory QS
DiscontinuationStays tightened (repeated fails)Govt. may halt all acceptance
📌

When sampling restarts after discontinuation, it begins at tightened inspection — not normal. Switching procedures are applied independently for each group of characteristics or individual characteristic.

Worked Examples from MIL-STD-1916 Appendix

Example 1 — Attributes Sampling (Wing Nuts, VL-IV)

📋

Inspection for missing thread. VL-IV specified. Table II, attributes plan. Lot sizes vary.

Lot #Lot SizeCLSample nNCRs FoundDispositionStageAction
15,000D1602WithholdNStart at normal VL-IV
2900A800AcceptN
33,000C1281WithholdN2/5 fail → switch to Tightened
41,000B2560AcceptT
51,000B2560AcceptT
6900A1920AcceptT
72,000C3200AcceptT
82,500C3200AcceptT5 consec. pass → back to Normal
93,000C1280AcceptN
105,000D1600AcceptN

Example 2 — Variables, Single-Sided Spec (VL-I)

Maximum operating temperature = 209°F on a circuit board relay. Lot of 40 units. VL-I specified, CL-A → nv = 4, k = 1.64 (from Table III).

Step-by-Step — Variables Sampling, Single-Sided
Step 1 — Measure sample: 197, 188, 184, 205 °F
Step 2 — x̄ = (197+188+184+205) ÷ 4 = 193.5 °F
Step 3 — s = √[Σ(xᵢ−x̄)² ÷ (n−1)] = √(265÷3) = 9.399
Step 4 — Quality Index Q = (USL − x̄) ÷ s
          Q = (209 − 193.5) ÷ 9.399 = 1.649
Step 5 — Compare Q ≥ k: 1.649 ≥ 1.64 ✅
ACCEPT LOT — Q = 1.649 exceeds k = 1.64. The sample mean is sufficiently far from the upper spec limit relative to the process spread. If Q had been < 1.64, the lot would be withheld regardless of whether any individual measurement exceeded 209°F.
Why variables sampling is powerful here: An attributes plan at VL-I CL-A needs n=12 (zero-accept). Variables needs only n=4 — a 67% reduction in sample size — because it uses the actual measurement values, not just pass/fail.

Example 3 — Variables, Double-Sided Spec (VL-I)

Same relay batch. Temperature must stay within 180–209°F. Same 4 measurements: 197, 188, 184, 205. Both QL ≥ k and F criterion must be satisfied.

Lower Quality Index QL
QL = (x̄ − LSL) / s
= (193.5 − 180) / 9.399
= 1.436
vs k = 1.64 → ✗ FAIL
Upper Quality Index QU
QU = (USL − x̄) / s
= (209 − 193.5) / 9.399
= 1.649
vs k = 1.64 → ✅ PASS
F Criterion Check (double-sided only)
F = s / (USL − LSL) = 9.399 / (209−180) = 9.399 / 29 = 0.324
Table F value at VL-I, CL-A = 0.370
Check: F ≤ F_table → 0.324 ≤ 0.370 ✅ PASS
WITHHOLD LOT — QL = 1.436 fails the k criterion (1.64). Even though QU passes and the F criterion passes, both quality indices must pass for a double-sided spec. The process mean is sitting too close to the lower limit. Disposition: 100% screen lot or return to supplier.

Example 4 — Continuous Sampling (Spot Welds, VL-II)

📋

CL-C, VL-II → i=116 (clearance number), f=1/48 (sampling frequency).

Item #ActionStage
1Start 100% screening. i=116.N
8Found defective unit — reset counter.N
124116 consecutive conforming units cleared → begin sampling f=1/48N
9,697200 consecutive conforming sampled → switch to Reduced f=1/68R
13,982Production interval tripled → CL-C to CL-E, f=1/136R
16,290Nonconforming unit found → switch to Normal, restart screening i=228N
16,518228 consecutive conforming cleared → sampling f=1/96N

Key Military & Defence Standards — Deep Reference

Beyond MIL-STD-1916, six standards define how defence contractors predict, test, and manage reliability and safety. Each one has a direct commercial equivalent — knowing both is essential for cross-sector work.

MIL-HDBK-217F — Reliability Prediction of Electronic Equipment

Published 1991. The DoD's framework for predicting failure rates of electronic components and systems during design. Two prediction methods exist — choose based on design maturity.

Method 1 — Parts Count

Used in early design when full stress analysis isn't possible. Requires: component quantities, generic quality level, and use environment. Quick and conservative.

λ_s = Σ(Nᵢ · λ_Gᵢ · πQᵢ)
Method 2 — Parts Stress

Used for detailed design when actual operating stresses are known. More accurate but requires thermal, electrical, and environmental stress data per component.

λ_p = λ_b · πT · πE · πQ · πA
Worked Example — Resistor Failure Rate (Parts Stress)
Component: Carbon film resistor, 1/4W, operating at 0.5W (50% stress)
Base failure rate: λ_b = 0.0012 failures/10⁶ hours
Temperature factor: πT = 2.8 (85°C junction temp)
Environment factor: πE = 4.0 (GM Ground Mobile)
Quality factor: πQ = 1.0 (MIL-R-11 qualified)
─────────────────────────────────────────────
λ_p = 0.0012 × 2.8 × 4.0 × 1.0 = 0.01344 failures/10⁶ hrs
MTBF = 1 / λ_p = 74.4 million hours (single resistor)

πE is the dominant multiplier — ground mobile environment is 4× more harsh than ground benign. Reducing operating temperature from 85°C → 55°C would cut πT from 2.8 → 1.4, halving the failure rate.

MIL-STD-1629A — FMECA (Failure Mode Effects & Criticality Analysis)

The military extension of commercial FMEA. Adds a quantitative Criticality Number and a Criticality Matrix that plots every failure mode visually by severity and probability. Required on all major defence system acquisitions.

Severity Categories
I — CatastrophicDeath / system loss
II — CriticalSevere injury / major damage
III — MarginalMinor injury / minor damage
IV — NegligibleNo injury / negligible damage
Criticality Number Formula
Cm = β × α × λp × t
β = conditional prob of loss
α = failure mode ratio
λp = part failure rate
t = operating time
Criticality Matrix — Hydraulic Brake System FMECA Example

Each failure mode is plotted by severity category (x-axis) vs criticality number Cm (y-axis). Modes in the upper-left require immediate design action.

Failure ModeSeverityβαλp ×10⁻⁶t (hrs)CmPriority
Seal leak → loss of pressureI0.90.354.220002.646🔴 Redesign
Caliper piston stickII0.70.203.120000.868🟡 Action
Brake fade under loadIII0.50.302.820000.840🔵 Monitor
Warning light false triggerIV1.00.155.020001.500🟢 Accept

Note: A Severity I mode always demands action regardless of Cm value. High Cm on a Severity IV mode (warning light) is acceptable — it's a nuisance, not a safety hazard.

MIL-STD-882E — System Safety

The DoD system safety standard. Required for all acquisitions. Defines hazard identification, risk assessment, and risk management for hardware, software, and human factors. Risk = f(Severity, Probability).

MIL-STD-882E Risk Assessment Matrix
Probability Cat I
Catastrophic
Cat II
Critical
Cat III
Marginal
Cat IV
Negligible
A — Frequent1 High2 High5 Med10 Low
B — Probable2 High3 High6 Med11 Low
C — Occasional3 High4 Med7 Low14 Low
D — Remote4 Med8 Low12 Low16 Low
E — Improbable5 Med9 Low13 Low17 Low

High risk (red) = Unacceptable — programme stop until mitigated. Medium = Acceptable with senior approval. Low = Acceptable with programme manager approval.

Real-World Application — F-35 OBIGGS System

The On-Board Inert Gas Generation System (OBIGGS) prevents fuel tank explosions by replacing ullage with nitrogen-enriched air. Under MIL-STD-882E, a failure of OBIGGS is Severity Cat I (catastrophic — fuel tank explosion). Probability was classified as D (Remote) given redundant sensors and pre-flight checks. Risk rating: 4 (Medium). The programme invested in a secondary inerting monitor to reduce probability to E (Improbable), moving the risk to 5 (Medium) — still requiring senior approval. This drove the system architecture decision to add the backup monitor.

MIL-STD-810H — Environmental Engineering & Laboratory Tests

The definitive environmental testing standard — now used extensively in commercial product ruggedisation (laptops, phones, industrial equipment) not just defence. 29 test methods covering every environmental stress a product might encounter.

MethodTestTypical ConditionsReal-World Stress
500.6Low Pressure (Altitude)70,000 ft equivalentAircraft cargo bay, unpressurised
501.7High Temperature+71°C storage, +49°C operatingDesert deployment, vehicle interior
502.7Low Temperature−51°C storage, −32°C operatingArctic operations, stratospheric
507.6Humidity95% RH, 30 days cyclingTropical jungle, ship deck
509.7Salt Fog5% NaCl, 96 hrsNaval/maritime environment
510.7Sand & Dust1.06 g/m³ dust concentrationMiddle East desert, helicopter downwash
514.8VibrationTailored PSD per platformVehicle road, aircraft turbulence
516.8ShockHalf-sine, sawtooth, trapezoidalRough handling, explosive nearby

MIL-STD-785B — Reliability Programme for Systems & Equipment

The lifecycle reliability management standard. Defines the tasks, reviews, and evidence a contractor must demonstrate across programme phases from concept through production.

Task 101–106
Reliability programme planning, monitoring, control, failure reporting (FRACAS), corrective action
Task 201–205
Design guidelines, stress analysis, sneak circuit analysis, effects of functional testing
Task 301–303
Reliability development testing, environmental stress screening (ESS), reliability qualification
💡

FRACAS (Failure Reporting, Analysis, and Corrective Action System) is the heart of MIL-STD-785B. Every failure in test or field must be formally reported, root-caused, and corrective action verified — creating a closed feedback loop that drives reliability growth throughout the programme.

AS9100D — Aerospace Quality Management System

ISO 9001 + 60+ aerospace-specific requirements. The entry ticket for Boeing, Airbus, Lockheed Martin, Northrop Grumman, and most tier-1 primes. Mandatory for the civil aerospace supply chain globally.

Key additions over ISO 9001
  • First Article Inspection (FAI) per AS9102
  • Foreign Object Damage/Debris (FOD) prevention
  • Key Characteristics (KC) identification and control
  • Configuration management requirements
  • Counterfeit parts prevention (clause 8.1.4)
  • On-time delivery as a quality metric
Certification hierarchy
  • AS9100D — Design & manufacture
  • AS9110C — MRO / maintenance organisations
  • AS9120B — Distributors / stockists
  • Audited by IAQG-accredited CBs (BSI, Bureau Veritas, etc.)
  • Certificate validity: 3 years with annual surveillance

Military Standards Quick Reference

StandardTopicCommercial EquivalentStatus
MIL-STD-1916DoD Preferred Acceptance MethodsISO 2859 / ANSI Z1.4Active (1996)
MIL-STD-785BReliability Program MgmtIEC 60300-2Active
MIL-HDBK-217FElectronic Reliability PredictionIEC TR 62380, Telcordia SR-332Active (frozen)
MIL-STD-1629AFMECAAIAG FMEA, SAE J1739Active
Attribute Acceptance SamplingANSI/ASQ Z1.4, ISO 2859
Variables Acceptance SamplingANSI/ASQ Z1.9, ISO 3951
Calibration SystemsISO/IEC 17025, ISO 10012
MIL-STD-882ESystem SafetyIEC 61508, SAE ARP4761Active
MIL-STD-810HEnvironmental TestingIEC 60068, RTCA DO-160Active
AS9100DAerospace QMSISO 9001 + Aerospace CSRActive (Rev D)
AQAP-2110NATO Quality AssuranceISO 9001 + NATO CSRActive (Ed. 3)
📌

MIL-STD-1916 supersedes MIL-STD-414 and MIL-STD-1235 (single/multi-level continuous sampling). The key difference from MIL-STD-105E: 1916 has a zero-acceptance criterion (Ac=0 always) versus 105E's AQL-based accept numbers. 1916 is philosophically aligned with prevention and SPC; 105E was detection-based.

Acceptance Sampling Theory — Errors, AOQ, AOQL, ATI & Dodge-Romig

The mathematics behind acceptance sampling — understanding what happens to quality as lots pass through a sampling plan, and the trade-offs between producer and consumer risk.

Type I & Type II Errors — Producer's Risk vs Consumer's Risk

Actual: Good LotActual: Bad Lot
Decision: Accept✅ Correct✗ Type II Error (β)
Decision: Reject✗ Type I Error (α)✅ Correct
Type I Error (α)Type II Error (β)
NameProducer's riskConsumer's risk
What happensGood lot rejected — producer losesBad lot accepted — consumer receives defectives
Fire alarm analogyFalse alarm — inconvenienceMissed fire — disaster
Control methodFixed at pre-determined level (1%, 5%, 10%)Controlled to <10% by appropriate sample size
Simple definitionInnocent declared guiltyGuilty declared innocent
💡

As α (producer's risk) increases (e.g. 0.01→0.05), β (consumer's risk) goes down — they trade off against each other. To reduce BOTH Type I and II errors simultaneously: increase the sample size.

RQL / LTPD — Rejectable Quality Level

RQL = Rejectable Quality Level (= LTPD = LQL)

The defect rate we want to reject a high proportion of the time (controlled by β, the consumer's risk).

Consumer Risk β = P(accept lot | lot has RQL% defectives)

Example: β = 0.10, RQL = 8% means: we would expect to accept lots with 8% defectives only 10% of the time maximum. Equivalently: 90% of lots at RQL quality will be rejected.

AQL vs RQL on the OC Curve

The OC Curve has three zones:

  • Acceptable quality zone — near AQL, high P(accept)
  • ⚠️ Indifferent zone — between AQL and RQL, intermediate P(accept)
  • Rejectable quality zone — near RQL/LTPD, low P(accept)

Increasing n (sample size) steepens the OC curve — narrows the indifferent zone and brings it closer to the ideal step function.

Interactive OC Curve — See How n and Ac Shape Acceptance Probability

Adjust the sample size (n) and acceptance number (Ac) to see how the Operating Characteristic curve changes. A steeper curve gives sharper discrimination between good and bad lots — but costs more to inspect.

80
2
1.5%
Pa at AQL
Producer Risk α
AOQL (approx)
RQL (β=10%)

AOQ, AOQL & ATI Formulas

AOQ — Average Outgoing Quality

The average quality of outgoing product, accounting for the fact that rejected lots are screened 100% and returned perfect.

AOQ = p × Pₐ × (N−n)/N
Simplified: AOQ ≈ p × Pₐ

p = incoming defect rate, Pₐ = probability of acceptance, N = lot size, n = sample size

AOQL — Average Outgoing Quality Limit

The maximum (worst) AOQ for a given sampling plan — the peak of the AOQ curve. As incoming quality deteriorates beyond AOQL, AOQ actually improves because more lots get rejected and 100% screened.

AOQL = max(AOQ) across all p values

The Dodge-Romig sampling plan uses AOQL as its design criterion.

ATI — Average Total Inspection

Total average number of pieces inspected per lot, combining the sample (from accepted lots) and 100% screening (from rejected lots).

ATI = n·Pₐ + N·(1−Pₐ)
= n + (1−Pₐ)(N−n)

ATI increases sharply as incoming quality deteriorates — minimising ATI is the design goal of Dodge-Romig.

Worked Example — AOQ Calculation

Sampling plan: N = 1,000, n = 80, Ac = 3. Incoming lot has 2% defectives.

Pₐ = POISSON.DIST(3, 80×0.02, TRUE) = POISSON.DIST(3, 1.6, TRUE) = 0.921
AOQ = p × Pₐ = 0.02 × 0.921 = 0.0184 (1.84%)
ATI = n + (1−Pₐ)(N−n) = 80 + (1−0.921)(1000−80) = 80 + 0.079×920 = 80 + 72.7 = 152.7 pieces/lot

Interpretation: The average outgoing quality is 1.84% defective — slightly better than incoming (2%) because 8% of lots are 100% screened and returned perfect.

Inspection Levels — ANSI/ASQ Z1.4

LevelSample sizeWhen to use
Level ISmaller nLess discrimination needed — use when lower risk, trusted supplier
Level IIStandard nDefault / normal use — used unless otherwise specified
Level IIILarger nGreater discrimination — use for critical characteristics or new suppliers
S-1 to S-4Small nSpecial levels — small sample sizes when large sampling risks are acceptable. S-4 > S-3 > S-2 > S-1 in sample size.
💡

Sample size relationship: n(Level III) > n(Level II) > n(Level I). A larger sample size steepens the OC curve — better discrimination between good and bad lots, but higher inspection cost. The relationship between lot size and sample size is defined in Table I (code letters A–R).

Dodge-Romig Sampling Plans

AttributeMIL-STD-105 / ANSI Z1.4Dodge-Romig
BasisAQL — protects the producerLTPD (consumer's risk) or AOQL — protects the consumer
Sampling typesSingle, Double, MultipleSingle and Double only
Primary design goalEnsure high-quality lots are accepted at a defined rateMinimise ATI — minimise total inspection effort for a given quality protection level
RequiresAQL specificationEstimate of process average (from recent data). If unknown, use largest table value.
ExampleAQL=1.5%, N=1000 → n=80, Ac=3AOQL=3%, N=1000, Process avg=1.5% → n=44, c=2, LQL=11.8%
💡

Dodge-Romig is the preferred plan when the consumer wants assurance that the outgoing quality will not exceed a stated limit (AOQL) regardless of incoming quality — ideal for critical product or safety-related items.

Risk Analysis

FMEA & RPN — Failure Mode & Effects Analysis

FMEA is the discipline of imagining every way something can go wrong — before it does. Two distinct types: Design FMEA catches failures born in the blueprint; Process FMEA catches failures born on the shop floor.

L M H L M H

What is FMEA and Why Does It Matter?

FMEA forces you to think about failure proactively — before a customer finds it in the field, before a recall, before someone gets hurt. It is the bridge between design intent and production reality.

The core question for every item on the FMEA: "In what ways could this fail, what happens when it does, and what are we doing about it?"

📊 The FMEA Logic Chain — Every Row Answers These Three Questions
CAUSE Why does it happen? Root of the problem leads to FAILURE MODE How does it fail? The specific malfunction mechanism of failure causes EFFECT Customer impact Drives Severity rating RPN S × O × D Priority score Occurrence (O) Detection (D) Severity (S) = Priority
RPN Formula
RPN = S × O × D
Range: 1–1000
Severity (S)
1 = No effect
10 = Safety/regulatory
Occurrence (O)
1 = Unlikely
10 = Inevitable
Detection (D)
1 = Almost certain
10 = No detection

DFMEA vs PFMEA — Two Different Questions

📐
Design FMEA (DFMEA)

"Is the design itself capable of meeting its intended function under all expected use conditions?"

Owner: Design Engineering. Done during concept/development phase. Corrective actions = design changes.

🏭
Process FMEA (PFMEA)

"Can the manufacturing process consistently produce a conforming part without creating a defect?"

Owner: Manufacturing Engineering. Done pre-launch. Corrective actions = process controls, poka-yokes.

⚠️

The RPN trap. Two different failure modes can share the same RPN yet have radically different risk profiles. S=10, O=1, D=1 (RPN=10) is a potential safety catastrophe; S=2, O=5, D=1 (RPN=10) is inconsequential. Always act on high Severity first, regardless of RPN.

🔑 When to Do FMEA

  • New design or product

    DFMEA during concept phase when changes are cheap. PFMEA before production launch.

  • Design/process changes

    Update affected FMEA rows whenever a change is made — even "minor" changes.

  • Field failure or warranty

    Use FMEA to document and prevent recurrence. Add new failure modes discovered.

  • PPAP requirement

    DFMEA (if design owner) + PFMEA both required for Level 3 PPAP submission.

Design FMEA (DFMEA) — Catching Failures in the Blueprint

DFMEA asks: "Even if we manufacture this perfectly, does the design itself do what it's supposed to do under all conditions?" It lives in the design engineer's world — materials, tolerances, geometry, load cases, wear-out mechanisms, edge cases.

📌

DFMEA is required when the supplier owns the product design. If you are making a part to a customer drawing, a PFMEA is sufficient. If you designed the part, you must also do a DFMEA. Required for PPAP Level 3 when design responsibility is with the supplier.

The DFMEA Thought Process

  • 1

    Define the Function

    For each component, state its intended function precisely. "Transmit torque of 50±2 Nm from input shaft to output shaft without slippage under 100,000 cycles at 80°C."

  • 2

    Identify Failure Modes

    How could this component fail to perform its function? Examples: fracture, deformation, corrosion, excessive wear, loss of insulation, contact intermittent.

  • 3

    Determine Effects

    What does the next-higher assembly experience when this fails? What does the customer ultimately experience? Rate Severity 1–10.

  • 4

    Find Root Causes

    Design-level causes: insufficient material strength, wrong tolerance stack, inadequate surface finish spec, missing environmental protection, wrong material selection.

  • 5

    List Current Design Controls

    Prevention: Design guidelines, material specs, analysis (FEA, fatigue). Detection: Prototype testing, DVP&R, simulation, inspection. Rate Occurrence and Detection.

  • 6

    Take Action & Re-evaluate

    Implement design changes. Update specs, drawings, test plans. Recalculate RPN. Verify effectiveness.

Worked Example — EV Battery Cell Aluminium Casing

🔋

Function: Aluminium casing must contain electrolyte, withstand 50 bar internal pressure during thermal runaway, and maintain electrical isolation from adjacent cells for the 15-year vehicle life.

📋 DFMEA — EV Battery Cell Casing (selected rows)
Function Failure Mode Effect on Customer S Root Cause O Detection Control D RPN
Contain electrolyte Casing crack / leak Electrolyte contact → fire risk → vehicle loss 10 Wall thickness < 0.8 mm at weld seam; fatigue from thermal cycling 3 FEA fatigue analysis; 1,000-cycle pressure test at DVP stage 2 60
Withstand 50 bar pressure Burst / catastrophic rupture Thermal runaway propagation → vehicle fire 10 Insufficient yield strength spec; wrong alloy grade selected 2 Burst test per UL 2580; FEA pressure simulation; material cert review 1 20
Maintain electrical isolation Dielectric breakdown Cell-to-cell short → fire / BMS fault 9 Coating thickness < 20 µm at edges; holiday defects in anodising 4 Hi-pot test 100% incoming; SEM cross-section at sampling frequency 3 108
15-year corrosion resistance Pitting corrosion at weld Gradual electrolyte seep → premature capacity fade 6 Wrong filler wire alloy in laser weld; porosity from humidity contamination 4 Salt spray test per ISO 9227; weld procedure qualification 4 96
🎯

DFMEA Corrective Action Priority for this example: Row 3 (RPN=108, S=9) and Row 1 (RPN=60, S=10) are both flagged. The dielectric breakdown row is prioritised because S=9 AND the combined RPN is highest. Actions: increase anodising spec to ≥25 µm, add 100% hi-pot in design verification, update drawing callout.

Process FMEA (PFMEA) — Catching Failures on the Shop Floor

PFMEA asks: "Even with a perfect design, how could our manufacturing process build it wrong?" It lives in the process engineer's world — machines, operators, tooling, fixtures, parameters, environment, and measurement systems.

📌

PFMEA is always required. Whether you own the design or not, you always own your process. PFMEA is linked directly to the Process Flow Diagram and drives the Control Plan — these three documents must be consistent with each other.

PFMEA is Linked to Three Documents

📊 The Quality Triad — Process Flow, PFMEA, and Control Plan Must Be Consistent
Process Flow Defines process steps & process parameters steps flow into Process FMEA Risk-ranks each step Assigns detection controls controls flow to Control Plan Documents what to monitor & react plan All three must be consistent — step numbers, characteristic names, and controls must match exactly

PFMEA Thought Process

  • 1

    Map Every Process Step

    Start from the Process Flow Diagram. Each operation becomes one or more PFMEA rows. Be specific: "Laser weld casing" not just "welding."

  • 2

    State the Process Function

    What is this step supposed to achieve? "Weld casing at 1.5 kW, 3.5 m/min to achieve ≥ 0.8 mm penetration with ≤ 0.1 mm porosity."

  • 3

    Identify Failure Modes

    Ways the process step could go wrong: under-weld, over-weld, porosity, misalignment, wrong parameters, incorrect part seated, fixture worn.

  • 4

    Assess Effects

    What is the impact on the next operation? On the final customer? Rate Severity. Separate internal (scrap/rework) from external (field failure).

  • 5

    Find Process Causes

    Process-level causes (not design): machine wear, incorrect setup, operator error, wrong material lot, ambient temperature change, gage drift.

  • 6

    List Controls → Rate O and D

    Prevention: SPC, poka-yoke, maintenance plan, training. Detection: 100% visual, CMM check, functional test, SPC chart. Rate Occurrence and Detection honestly.

Worked Example — Laser Weld Station (Battery Cell Casing)

🏭

Process Step: Laser weld aluminium casing lid to body. Process parameters: Power = 1.5 kW, Speed = 3.5 m/min, Focus offset = 0 mm. Key characteristic: weld penetration ≥ 0.8 mm, porosity ≤ 0.1 mm dia.

📋 PFMEA — Laser Weld Station (selected rows)
Process Function Failure Mode Effect on Customer S Cause O Current Control D RPN
Weld penetration ≥ 0.8 mm Under-penetration (< 0.8 mm) Casing leak in field → electrolyte contact → fire 10 Laser power drift below threshold; focus offset shift; contaminated optic 4 SPC on laser power; cross-section destructive test 1/shift; weekly lens cleaning 5 200
Porosity ≤ 0.1 mm dia. Excess weld porosity Reduced seal strength → gradual leak → capacity fade 7 Surface contamination (oil, moisture); shielding gas flow low; wrong travel speed 5 X-ray inspection 100% per lot; shielding gas flow alarm; pre-clean station 3 105
Weld path alignment ±0.1 mm Weld off-seam Incomplete seal → early leak in service 9 Fixture wear > 0.05 mm; incorrect seam-tracking calibration 2 Vision system seam-tracker; fixture Cmk ≥ 1.67 validated monthly 2 36
Heat input to cell ≤ 80°C Thermal damage to cell chemistry Premature capacity degradation; early cell death 8 Excessive weld speed reduction; multiple re-welds; coolant system failure 2 Thermocouple on fixture; weld parameter lockout; coolant flow alarm 2 32
🚨

Row 1 (RPN=200, S=10) demands immediate action. Recommended actions: ① Install inline laser power monitoring with automatic stop if power deviates >2% for >50 ms. ② Increase cross-section check from 1/shift to 1/100 units for 4 weeks until process is validated. ③ Add daily optics cleaning to PM schedule. Target RPN after actions: S=10, O=2, D=2 → RPN=40.

Interactive RPN Calculator

Drag the sliders to set Severity, Occurrence, and Detection. RPN and Action Priority update instantly. Remember: S = 9 or 10 always requires action, regardless of RPN.

1 = No effect 7 10 = Safety/regulation
1 = Remote (<1/1.5M) 5 10 = Very high (≥1/2)
1 = Almost certain detect 5 10 = No control exists
175
Risk Priority Number (S × O × D)
HIGH RISK — Mandatory action required
RPN < 50
Low Risk
Document and monitor
RPN 50–125
Medium Risk
Review & improve if feasible
RPN > 125
High Risk
Mandatory action — no exceptions

Severity / Occurrence / Detection Rating Scales (AIAG PFMEA)

Severity (S)
SEffectCriteria
10Hazardous — no warningSafety issue, regulatory non-compliance. Failure without warning.
9Hazardous — with warningSafety issue. Failure with warning before occurrence.
System inoperable, loss of primary function.
System operable, reduced performance. Customer dissatisfied.
6ModerateSystem operable, comfort item inoperable. Customer discomfort.
5LowSystem operable, comfort item reduced performance.
4Very LowFit/finish defect noticed by most customers (70%).
3MinorFit/finish defect noticed by average customers (50%).
2Very MinorDefect noticed only by discriminating customers (25%).
1NoneNo discernible effect whatsoever.
Occurrence (O)
OProbabilityApproximate Rate
10Very High≥ 1 in 2
9Very High1 in 3
High1 in 8
High1 in 20
6Moderate1 in 80
5Moderate1 in 400
4Moderate1 in 2,000
3Low1 in 15,000
2Low1 in 150,000
1Remote≤ 1 in 1,500,000
Detection (D)
DAbility to DetectTypical Control
1Almost CertainProven poka-yoke — physically impossible to pass
2Very High100% automated gauge with alarm & stop
3High100% automated gauge, no automatic stop
4Moderately HighSPC with immediate reaction plan
5ModerateSPC — operator reacts to out-of-control signal
6Low100% manual inspection — variable attribute
7Very LowRandom or double sampling only
Visual inspection only, no documented method
No detection control — will be found by end user
10No ControlNo inspection. Defect certain to reach customer.
💡

Detection scale is counter-intuitive: D=1 is best (certain to detect before customer), D=10 is worst (no control). The inverse scale trips people up constantly — lower Detection score means better controls. A poka-yoke that makes a defect physically impossible to produce gets D=1.

AIAG-VDA FMEA 2019 — What Changed and Why It Matters

The 2019 AIAG-VDA FMEA Handbook supersedes both AIAG FMEA 4th Edition and VDA Volume 4. It represents the most significant overhaul of automotive FMEA methodology in 25 years.

The Core Problem with Classic RPN

❌ Problem with Classic RPN

S=10, O=1, D=1 gives RPN=10

S=2, O=5, D=1 also gives RPN=10

The first case is a potential safety catastrophe. The second is trivial. Classic RPN treats them identically.

✅ AIAG-VDA Solution: Action Priority

An Action Priority (AP) Table replaces the single RPN number. It uses a three-dimensional lookup — S, O, and D — to determine priority rather than simple multiplication.

S=9/10 always → High AP, regardless of O or D values.

Action Priority (AP) Categories

PriorityAction RequiredWhat it Means
High (H) Mandatory action Team MUST identify appropriate actions to improve prevention and/or detection. Management review required. Escalate if no actions identified.
Team SHOULD identify improvement actions. Management discretion on whether to escalate. Document rationale if no action taken.
Low (L) At team discretion Team should consider improvement if easily achievable. Document rationale if no action taken.

Key Changes in AIAG-VDA 2019 vs Classic FMEA

TopicClassic AIAG FMEA 4th Ed.AIAG-VDA 2019
Risk metricSingle RPN number (S×O×D)Action Priority (AP) table — 3-dimensional
Severity 9/10May have low RPN, ignoredAlways = High AP, always requires action
Process5 steps7 steps (added Planning & Preparation, Documentation)
Prevention vs DetectionSingle "Current Controls" columnSeparate: Prevention Controls + Detection Controls
New FMEA typeNot presentMSR — Monitoring & System Response (functional safety)
Failure chainMode → Effect → CauseStructure Analysis → Function Analysis → Failure Analysis
StandardAIAG FMEA 4th Ed. (2008)AIAG-VDA FMEA Handbook (2019, joint)
📌

Transition Note: Many automotive OEMs are migrating to AIAG-VDA 2019 format and will begin requiring it in new PPAP packages. However, the traditional S×O×D RPN approach remains valid for non-automotive applications and is still widely used in military standards (MIL-STD-1629A), medical devices (ISO 14971), and aerospace (SAE ARP4761). When in doubt, confirm the customer's required FMEA format before starting.

Design & Development — ISO 9001:2015 §8.3

ISO 9001:2015 Section 8.3 establishes requirements for controlling the design and development of products and services. 80% of product costs are fixed at the design stage — making rigorous design control the highest-leverage quality activity.

ISO 9001:2015 §8.3 Structure

ClauseRequirementKey points
§8.3.1GeneralEstablish, implement, and maintain a design and development process appropriate to ensure products meet requirements
§8.3.2PlanningDetermine stages and controls, reviews, responsibilities, interfaces; consider nature, duration, and complexity
§8.3.3InputsFunctional and performance requirements; statutory and regulatory requirements; previous similar designs; standards; potential failure consequences (FMEA, QFD, DFX, DFSS)
§8.3.4ControlsReviews — evaluate results vs requirements (§8.3.4a); Verification — outputs meet inputs (§8.3.4c); Validation — product meets intended use (§8.3.4d)
§8.3.5OutputsMeet input requirements; specify characteristics for provision; include acceptance criteria; identify critical characteristics
§8.3.6ChangesIdentify, review, and control changes; review effects on constituent parts and already-delivered products

Design Review vs Verification vs Validation

📋 Design Review (§8.3.4a)

Evaluate ability of design results to meet requirements. Typically at 30%, 60%, 90% milestones. Multi-disciplinary for complex products. Areas: objectives, assumptions, alternatives, risks, budget, safety, maintainability.

✅ Verification (§8.3.4c)

Ensure design outputs meet design input requirements. "Are we building it right?" Checks design-to-spec conformance.

Outputs ⊇ Inputs
🎯 Validation (§8.3.4d)

Ensure products meet requirements for intended use. "Are we building the right thing?" Tests against real-world customer use.

Product ⊇ Customer need

Design for X (DFX) — Design Excellence Disciplines

80% of product costs are fixed at the design stage. DFX disciplines optimise a specific aspect of the product. Note they sometimes conflict — integrated product development teams balance competing objectives.

DFXFull namePrimary objectiveKey actions
DFMDesign for ManufacturingReduce manufacturing cost and difficultyReduce parts count; minimise fasteners; use standard parts (lower cost, shorter lead time, more reliable)
DFADesign for AssemblyEase and speed of assemblyReduce parts; self-locating features; single-direction assembly
DFMaintDesign for MaintainabilityReduce downtime and maintenance costEasy access to serviceable parts; standardised replacement parts; reduced skill level; easy fault detection
DFRDesign for ReliabilityExtend product useful lifeDesign for useful life; consider infant mortality and wear-out; remove weaknesses via FMEA; stress and derating
DFCDesign for CostMinimise total lifecycle costUse standard components; optimise tolerances; design for reuse and modularity
DFLogDesign for LogisticsEase transport, storage, and trackingEasy transport and storage; barcodes/traceability; standardisation; reusable packaging
DFEnvDesign for EnvironmentMinimise environmental impactDesign for repair, reuse, recycling; minimise hazardous materials; easy disassembly

Design for Six Sigma (DFSS) — Methodologies

DFSS applies to new product/process design where no existing process exists to improve. Unlike DMAIC (which improves existing processes), DFSS builds quality in from concept.

DMADV
DefineProcess/design goals; identify CTQs
MeasureMeasure CTQ aspects; establish baseline
AnalyseAnalyse designs; identify best alternatives
DesignDetail design of product or process
VerifyVerify the chosen design meets requirements
DMADOV
DefineGoals and customer needs
MeasureCTQs and performance gaps
AnalyseDesign alternatives
DesignDetail the design
OptimiseRefine — parameter and tolerance design
VerifyVerify and validate the design
IDOV
IdentifyVoice of Customer; translate to CTQs
DesignDetail design of product or process
OptimiseAnalyse and optimise design alternatives
VerifyVerify the chosen design
💡

IDOV explicitly starts with VOC — most customer-centric of the three DFSS methodologies

Technical Drawings, Tolerances & GD&T

Technical drawings are the universal language between design and manufacturing. The quality engineer must read drawings, understand tolerances, and interpret GD&T symbols — directly tested in the engineering practice.

1st Angle vs 3rd Angle Projection

Attribute1st Angle (Europe / ISO)3rd Angle (USA / ASME)
Object positionObject in the first quadrantObject in the third quadrant
View relationshipObject between observer and projection planeProjection plane between observer and object
Projection planeNon-transparentTransparent
Top view placementTop view placed below front viewTop view placed above front view
StandardsISO / BS / DINASME / ANSI

Title Block Contents

MandatoryAdditional
Organisation name/logoBill of materials
Drawing title & numberNotes & zone references (e.g. A5, B3)
Sheet & revision numberFinish / Weight / Heat treatment
Approvals (Prepared/Checked/Approved)General tolerances
Units, scale, projection symbolSurface roughness

Engineering Drawing Line Types

Line typePurpose
Construction (light thin)Auxiliary construction, projection lines
Outline (thick continuous)Visible boundary of the object
Hidden (thin dashed)Edges not visible from the current view
Centreline (chain)Axis of symmetry, hole centres, pitch circles
Dimension lineShows extent of a dimension with arrowheads
Break line (zigzag)Object continues beyond drawn portion
Cutting plane (thick chain)Defines plane of a section view
Hatch / Section linesMaterial cross-section in section views

Dimensioning Methods & Tolerance Fit Types

Dimensioning Methods

MethodDescriptionRisk
ChainDimensions placed end-to-endTolerance accumulation / stack-up
ParallelMultiple dimension lines all from same datum; no accumulationMore space required
RunningParallel style but superimposed on one line; origin point markedCan be harder to read

MMC, LMC & Fit Types

TermDefinitionExample
MMCMaximum material within toleranceSmallest hole, largest pin
LMCLeast material within toleranceLargest hole, smallest pin
Clearance fitAlways space between mating partsSliding bearings
Interference fitParts always interfere — press/shrink fitPress fits, permanent assembly
Transition fitMay be clearance or interference depending on actual dimsLocating fits

GD&T — Geometric Dimensioning & Tolerancing (ASME Y14.5)

GD&T is a symbolic language (ASME Y14.5-2009) that defines geometry according to functional limits. It provides a universal language between supplier, checker, and buyer — eliminating ambiguity in conventional ± tolerances.

Datum Reference Frame & Degrees of Freedom

A Datum is a perfect theoretical point, line, or plane. A Datum Feature is the physical surface where the datum is located. Three perpendicular datum planes constrain all 6 degrees of freedom:

DatumDOF constrainedRunning total
Primary (A)3 (3 rotations)3 of 6
Secondary (B)2 (2 translations)5 of 6
Tertiary (C)1 (last translation)6 of 6 — fully constrained

GD&T Characteristic Categories

CategoryCharacteristics (ASME Y14.5)
FormFlatness, Straightness, Circularity, Cylindricity
OrientationAngularity, Perpendicularity, Parallelism
LocationTrue Position, Concentricity, Symmetry
RunoutCircular Runout, Total Runout
ProfileProfile of a Line, Profile of a Surface
💡

Flatness example: A glass sheet 1000×500mm with flatness 0.2mm means the entire surface must lie within two parallel planes separated by 0.2mm — independent of the ±5mm size tolerance.

Robust Design & Signal-to-Noise Ratios

Robust design improves quality by minimising the effects of variation without eliminating the causes. Taguchi's SNR ratios identify control factor settings that make the product insensitive to noise.

Control Factors vs Noise Factors

TypeDefinitionExamples
Control FactorsCan be set and controlled by the engineerWelding: electrode type, position, preheat
Outer NoiseConsumer use conditions — difficult/expensive to controlTemperature, humidity, vibration, UV
Inner NoiseProduct deterioration over timeRusting, oxidation, wear, degradation
Between-Product NoisePiece-to-piece variationDimensional variation, material property variation

Three Design Stages (Taguchi)

① Conceptual Design

Select the best design concept from alternatives using feasibility and technology benchmarking.

② Parameter Design ← most important

Identify control factor settings that maximise SNR — make the product insensitive to noise. Uses orthogonal arrays. This is where Taguchi's method adds the most value.

③ Tolerance Design

Tighten tolerances only where necessary — reduces cost by avoiding unnecessarily tight tolerances everywhere.

Signal-to-Noise Ratio — What It Means Visually

SNR is the ratio of useful signal (what you want) to noise (what you don't want). A higher SNR means the product's response is dominated by the intended behaviour, not by variation. Taguchi's insight: maximise SNR always — regardless of whether the goal is smaller, larger, or on-target.

What SNR Measures — Low SNR vs High SNR
LOW SNR — Noise dominates Time / Replications Response Y μ Noise Signal Signal ≪ Noise → Hard to detect true response HIGH SNR — Signal dominates Time / Replications μ Noise Signal Signal ≫ Noise → Consistent, predictable output
Effect of Maximising SNR — Before vs After Robust Design
Response Y (e.g. weld strength) Frequency LSL USL Target BEFORE Low SNR High variance Tails outside spec AFTER High SNR Low variance Centred on target Mean unchanged — variance reduced
The key insight of robust design: the mean does not need to move — only the variance shrinks. By choosing control factor levels that maximise SNR, the product becomes insensitive to noise, so the distribution tightens around the target.

Three SNR Formulas — With Visual Context

The goal is always to maximise SNR. Taguchi unified three different engineering objectives into one consistent framework by choosing formulas where the maximum SNR always corresponds to the desired outcome.

Smaller is Better

Ideal = 0 or minimum. Wear, defects, contamination, shrinkage, response time.

Response Y f(Y) 0 Push distribution toward zero →
S/N = −10 log(Σ Y²/n)
Higher S/N = smaller mean AND smaller variance
Larger is Better

Ideal = maximum. Tensile strength, yield, fuel efficiency, pull force, adhesion.

Response Y f(Y) MAX ← Push toward maximum
S/N = −10 log(Σ 1/Y²/n)
1/Y² penalises small values — maximising S/N maximises Y
Nominal is Better

Specific target value. Dimensions, resistance, weight, temperature, voltage output.

Response Y f(Y) TARGET Ȳ = Target Minimise s²
S/N = 10 log(Ȳ²/s²)
Ȳ/s is the coefficient of variation — higher = tighter around target
💡

Key insight: All three SNR formulas use log base 10 (decibels). A higher SNR always means a more robust product. The sign convention ensures that maximising SNR always corresponds to the engineering objective — this is Taguchi's elegant unification of the three cases.

Risk Management

Risk Management

A structured approach to identifying, analyzing, and responding to uncertainty. Covers risk definitions (ISO 31000 & ISO 9000:2015), the full 5-step risk management process, qualitative and quantitative analysis tools including the Probability & Impact Matrix, and response strategies for both negative and positive risks.

Risk: Definitions & Key Concepts

Risk has two authoritative definitions in the quality engineering world. Understanding the nuance between them — and how they relate to opportunities and issues — is fundamental to every practitioner.

ISO 31000:2018

Effect of uncertainties on objectives

The enterprise risk management standard. Broad definition applicable to any organization at any level — strategic, operational, project, or product.

ISO 9000:2015

Effect of uncertainty

The quality management vocabulary standard. An effect is a deviation from the expected — positive or negative. Risk is characterized by potential events, consequences, and their likelihood of occurrence.

TermDefinitionKey Distinction
RiskEffect of uncertainty on objectives; can be positive or negativeFuture event — has not yet occurred
OpportunityA positive risk — uncertainty with a favorable effect on objectivesYou want to maximize these; exploit them
IssueA risk that has already occurredNo longer a future uncertainty — it is a current problem requiring immediate response
ThreatA negative risk — uncertainty with an unfavorable effect on objectivesYou want to minimize, transfer, or avoid these
Risk AppetiteThe amount and type of risk an organization is willing to pursue or acceptSet by leadership; informs prioritization thresholds
Residual RiskThe risk remaining after risk responses have been implementedEven after mitigating, some risk always remains

Risk vs. Issue: Risk = future potential event. Issue = risk that has materialised. Once a risk occurs, it transitions to an issue and requires a workaround or corrective action, not a contingency plan.

Why Take Risk?

⚖️
Risk vs. Reward

There is always a balance between risk and reward. Managing risk means finding the optimal point — not eliminating all risk.

📈
More Risk → More Reward?

Generally true — but not always. Higher risk does not guarantee higher reward. Smart risk management seeks better returns per unit of risk taken.

🎯
Optimize — Don't Eliminate

The goal is more rewards with less risk — achieved through systematic identification, analysis, and response planning.

ISO 9000:2015 — Key Nuances

  • An effect is a deviation from the expected — positive or negative
  • Risk is often characterized by reference to potential events and consequences, or a combination
  • Risk is often expressed as a combination of consequences × likelihood
  • The word "risk" is sometimes used only for negative consequences — but ISO 9000 explicitly includes positive effects

Risk Management: The 5-Step Process

Risk management is the identification, assessment, and prioritization of risks (positive or negative) followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events — or to maximize the realization of opportunities.

🔄 Risk Management — 5-Step Process Flow
1. Plan Risk Management 2. Identify Risks 3. Analyze Risks 4. Plan Risk Response 5. Monitor & Control Risks
StepProcessKey ActivitiesOutput
1 Plan Risk Management Define risk terms; define roles & responsibilities; select tools & templates; establish how to identify, analyze, respond, monitor risks Risk Management Plan
2 Identify Risks Systematic, methodic group process involving management, employees, customers, and other stakeholders; use brainstorming, FMEA, SWOT, Ishikawa Risk Register
3 Analyze Risks Qualitative (P&I Matrix — quick, subjective) and/or Quantitative (EMV, Monte Carlo, Decision Tree — detailed, analytic); prioritize risks Prioritized Risk List
4 Plan Risk Response For negative risks: Avoid, Mitigate, Transfer, Accept. For positive risks: Exploit, Enhance, Share, Accept. Assign risk owners. Risk Response Plan
5 Monitor & Control Risks Periodically review risk register; identify new risks; close resolved risks; conduct risk audits; handle unexpected risks with workarounds Updated Risk Register; Workarounds
💡

Risk Management is iterative, not sequential: Although presented as 5 steps, risk management is a continuous loop. New risks emerge throughout a project or product lifecycle. The risk register is a living document that must be reviewed regularly — not created once and filed away.

Step 1 Detail: Plan Risk Management

What to Define
  • ✦ Risk-related terms and definitions
  • ✦ Roles and responsibilities (Risk Owner concept)
  • ✦ Tools and templates for risk management
  • ✦ Probability & impact scales to be used
  • ✦ Risk thresholds (what score triggers action)
Planning Covers How to…
  • ✦ Identify risks (who, when, tools)
  • ✦ Analyze risks (qualitative and/or quantitative)
  • ✦ Plan risk responses (owners, strategies)
  • ✦ Monitor and control risks (frequency, triggers)

Step 2: Identify Risks

Risk identification is a systematic and methodic process best performed in a group environment. A wide range of stakeholders participate — management, employees, customers, and other interested parties. The output is a Risk Register listing all identified risks.

Key Characteristics

  • Systematic and methodic — not ad hoc
  • Best done in a group environment
  • Involves wide range of stakeholders
  • Identifies both positive and negative risks
  • Iterative — risks can emerge at any time

Who Participates?

Management Employees Customers Suppliers Other Stakeholders Subject Matter Experts

Tools for Risk Identification

ToolTypeHow It's Used for Risk IDBest For
BrainstormingGroup techniqueMost common approach; free-form idea generation in a group; facilitator captures all risks without judgmentAll risk identification; starting point for any risk session
Ishikawa DiagramCause & EffectSystematically explores causes across categories (Man, Machine, Method, Material, Environment, Measurement)Process risks; identifying root-cause risk categories
Flow DiagramProcess mappingMap the process; identify each step where something could go wrong — inputs, outputs, handoffs, decision pointsOperational and process risks; supply chain risk
SWOT AnalysisStrategic toolStrengths, Weaknesses, Opportunities, Threats; internal and external risk identificationStrategic and organizational risk; positive risks (opportunities)
FMEAFailure analysisSystematically identifies failure modes and their effects; each failure mode is a potential riskProduct/process design risks; manufacturing risks
Checklist / Historical DataHistorical referenceReview lessons learned from previous projects/products; use industry-standard risk checklistsRepeatable processes; established product lines
Expert Interviews / DelphiExpert elicitationIndividual or structured group interviews; Delphi uses iterative anonymous surveys to converge on consensusNovel technologies; unique or high-stakes projects

The Risk Register

The risk register is the primary output of the Identify Risks process. It is a living document that is updated throughout all subsequent risk management steps.

Risk Register FieldDescription
Risk IDUnique identifier for each risk
Risk DescriptionClear statement of the risk event and its potential cause and effect
Risk CategoryClassification (Technical, Schedule, Cost, Scope, External, etc.)
Probability ScoreLikelihood of occurrence (added during Analyze step)
Impact ScoreConsequence severity if risk occurs (added during Analyze step)
Risk ScoreProbability × Impact (added during Analyze step)
Risk OwnerPerson responsible for monitoring and responding to this risk
Response StrategyPlanned approach (Avoid/Mitigate/Transfer/Accept or Exploit/Enhance/Share/Accept)
Response ActionsSpecific actions to implement the chosen strategy
StatusActive / Closed / Occurred (became an Issue)

Step 3: Analyze Risks

Risk analysis prioritizes identified risks so that resources and attention can be focused on the highest-priority items. There are two main approaches: qualitative and quantitative.

Qualitative Risk Analysis

Quick and easy to perform. Uses descriptive or ordinal scales. Subjective by nature but valuable for initial prioritization when data is limited or time is short.

  • ✦ Fast and cost-effective
  • ✦ Subjective judgment
  • ✦ Uses rating scales (Low/Medium/High or 1–9)
  • ✦ Primary tool: Probability & Impact Matrix
  • ✦ Good for all risks as initial screen
Quantitative Risk Analysis

Detailed and time-consuming. Uses numerical data to produce a statistical analysis of risk impact. Analytic, data-driven, and defensible.

  • ✦ Requires real data or estimates
  • ✦ Objective and numeric
  • ✦ Tools: EMV Analysis, Monte Carlo, Decision Tree
  • ✦ Used for high-priority risks (from qualitative screen)
  • ✦ Provides probability distributions of outcomes

Quantitative Analysis Tools

ToolDescriptionWhen to Use
Expected Monetary Value (EMV) EMV = Probability × Impact ($). Calculates the expected financial value of a risk. Positive EMV = opportunity; Negative EMV = threat. Sum all EMVs to get overall risk exposure. Cost/benefit decisions on risk responses; comparing alternative responses; setting contingency reserves
Monte Carlo Analysis Computer simulation that runs the project/process thousands of times with randomly sampled input values. Produces a probability distribution of outcomes (cost, schedule, etc.). Complex projects with many interacting risks; when you need confidence intervals on outcomes
Decision Tree Diagram showing decisions, chance events (with probabilities), and outcomes (with values). Calculate EMV at each branch to determine best decision path. Go/no-go decisions; make-or-buy; alternative response strategies; multi-stage decisions under uncertainty
Sensitivity Analysis Determines which risk variable has the most impact on outcomes. Often visualized as a Tornado Diagram — bars sorted by impact magnitude. Identifying which risks deserve the most attention; resource prioritization

FMEA vs. P&I Matrix — Key Comparison

AspectFMEA (Risk Priority Number)Probability & Impact Matrix
FormulaRPN = Severity × Occurrence × DetectionRisk Score = Probability × Impact
Dimensions3 dimensions (adds Detection)2 dimensions (no Detection factor)
Impact / SeveritySeverity (1–10 scale)Impact (similar concept; often 1–9 scale)
ProbabilityOccurrence (1–10 scale)Probability (1–9 or Low/Med/High)
DetectionDetectability score (1–10, inverse)Not included
Primary UseProduct/process failure analysisProject/process risk prioritization
ContextDesign and process engineeringGeneral risk management

Probability & Impact (P&I) Matrix

The Probability and Impact Matrix is the primary qualitative risk analysis tool. It evaluates each risk on two dimensions — likelihood of occurrence and potential consequence — then combines them into a risk score used for prioritization.

Core Formula
Risk Score = Probability × Impact

Higher scores = higher priority risks requiring more immediate attention and resource allocation.

Sample Probability Scale

CategoryScoreDescription
Very High9Risk event expected to occur
High7Risk event more likely than not to occur
Probable5Risk event may or may not occur (50/50)
Low3Risk event less likely than not to occur
Very Low1Risk event not expected to occur

Sample Impact Scale (by Project Objective)

ObjectiveVery Low (1)Low (3)Moderate (5)High (7)Very High (9)
CostInsignificant<10% cost impact10–20% cost impact20–40% cost impact>40% cost impact
ScheduleInsignificant<5% schedule slip5–10% schedule slip10–20% schedule slip>20% schedule slip
ScopeBarely noticeableMinor areas impactedMajor areas impactedChanges unacceptable to clientProduct becomes useless
QualityBarely noticeableMinor functions impactedClient must approve reductionQuality reduction unacceptableProduct becomes useless

P&I Matrix — Numerical (1–9 Scale)

📊 Probability × Impact Matrix — Risk Scores
Prob ↓ / Impact → 1 (Very Low) 3 (Low) 5 (Moderate) 7 (High) 9 (Very High)
9 (Very High) 9 27 45 63 81 ★
7 (High) 7 21 35 49 63
5 (Moderate) 5 15 25 35 45
3 (Low) 3 9 15 21 27
1 (Very Low) 1 ☆ 3 5 7 9
Low Risk — Monitor Medium Risk — Plan Response High Risk — Immediate Action Critical Risk — Top Priority
📝

Exam Example: A risk has Very Low probability (score = 1) but Very High impact (score = 9). Risk Score = 1 × 9 = 9. This falls in the yellow zone — medium priority. Compare to a risk with Moderate probability (5) and Moderate impact (5) = score of 25 — which is higher priority despite neither dimension being extreme.

Step 4: Plan Risk Response

Risk response planning determines how to decrease the possibility of negative risks affecting objectives and how to increase the possibility of positive risks helping objectives. Strategies differ depending on whether the risk is negative (threat) or positive (opportunity).

Negative Risk (Threat) Responses

Goal: Reduce the probability, impact, or both of a negative event affecting your objectives.

🚫 AVOID

Change the plan to eliminate the risk entirely. The risk event becomes impossible.

Examples: Adopt proven approach instead of new one; improve team communication; change project scope

⚠️ MITIGATE

Reduce the probability and/or impact of the risk. The risk may still occur but its effect is lessened.

Examples: Simplify processes; develop prototype; additional inspections; lessons learned from past projects

🔄 TRANSFER

Shift the financial impact of the risk to a third party. The risk still exists — it's moved, not eliminated.

Examples: Insurance; performance warranty; subcontracting; fixed-price contracts

✅ ACCEPT

Acknowledge the risk and take no action — when no action is feasible or impact is too small.

Passive: No contingency plan; monitor and address if/when it occurs
Active: Create contingency plan in advance; monitor triggers
Positive Risk (Opportunity) Responses

Goal: Increase the probability, impact, or both of a positive event benefiting your objectives.

🎯 EXPLOIT

Eliminate uncertainty — ensure the opportunity definitely happens and make maximum use of it.

Examples: Assign best team members; allocate additional resources; fast-track the opportunity

📈 ENHANCE

Increase the probability and/or positive impact of the opportunity. Unlike Exploit, the opportunity may still not occur.

Examples: Add more resources; improve preconditions; invest in enablers of the opportunity

🤝 SHARE

Allocate some or all of the opportunity to a third party best able to capture it.

Examples: Joint venture; partnership; risk-sharing team; consortium or special-purpose company

✅ ACCEPT

Accept the opportunity if it occurs but do not actively pursue it — when the probability and rewards are not attractive enough to justify investment.

Example: A beneficial side-effect of another activity that will be welcomed but not specifically engineered

🔍

Accept applies to both sides: Accept is the only strategy that appears in both negative and positive risk response tables — but its meaning differs. For threats, Accept means tolerating the risk (passive or active). For opportunities, Accept means welcoming the benefit if it naturally occurs without actively pursuing it.

Step 5: Monitor & Control Risks

Risk monitoring and control is an ongoing process throughout the entire project or product lifecycle — not just a phase-end activity. The goal is to keep the risk register current, ensure response plans are being executed, and handle unexpected risks as they arise.

Core Activities

Ongoing Reviews
  • ✦ Regularly review identified risks — are they still relevant?
  • ✦ Identify and add new risks that emerge
  • ✦ Remove risks that are no longer relevant or have been resolved
  • ✦ Track triggers (warning signs) that indicate a risk is about to occur
Risk Audits
  • ✦ Verify that risk response plans are actually being implemented
  • ✦ Confirm effectiveness of implemented responses
  • ✦ Document lessons learned for future risk management
  • ✦ May be conducted by an independent auditor

Handling Unexpected Risks: Workarounds

A workaround is an unplanned response to a risk event that was not identified or not expected. When a risk materializes as an issue and no contingency plan exists, a workaround is improvised to minimize the impact.

  • ▸ Used to deal with unexpected risks to reduce their impact
  • ▸ Workarounds should be documented — they become lessons learned and may identify new risks
  • ▸ Distinguished from contingency plans: contingency = planned in advance; workaround = improvised on the fly

Risk vs. Issue vs. Workaround — Key Distinctions

ConceptTimingResponse TypeDocumentation
Risk Future — has not yet occurred Contingency Plan (planned in advance) Risk Register
Issue Present — the risk has now occurred Execute contingency plan (if one exists) or workaround Issue Log / Risk Register update
Workaround Present — unidentified risk has occurred Improvised, unplanned response Document for lessons learned; update risk register
Contingency Plan Created in advance (during Plan Risk Response) Pre-defined actions triggered when a specific risk occurs Risk Register / Risk Response Plan
💡

Risk Monitoring is Continuous: Risks change over time. A low-probability risk can become high-probability as circumstances change. A risk can be closed when conditions change such that it can no longer occur. New risks can emerge at any stage. Regular risk review meetings are best practice.

Risk Management — Quick Reference & Exam Summary

Key formulas, mnemonics, and comparison tables for rapid reference.

Key Formulas
Risk Score = Probability × Impact
FMEA RPN = Severity × Occurrence × Detection
EMV = Probability (%) × Impact ($)
5-Step Process Mnemonic
Plan → Identify → Analyze → Respond → Monitor

"Please Identify All Risk Management" — steps in order

Negative vs. Positive Risk Strategies — Side by Side

Negative Risk (Threat)DescriptionPositive Risk (Opportunity)Description
AvoidEliminate the risk entirely — change the planExploitEnsure the opportunity definitely happens
MitigateReduce probability and/or impactEnhanceIncrease probability and/or impact
TransferShift financial impact to a third partyShareShare the opportunity with a third party
AcceptTolerate the risk (passive or active)AcceptWelcome it if it occurs — without actively pursuing

Common Pitfalls to Avoid

TrapCorrect Understanding
Thinking all risks are negativeISO 9000:2015 explicitly includes positive risks (opportunities). "Positive risk" is not an oxymoron.
Confusing Risk with IssueRisk = future potential event. Issue = risk that has already materialized. They require different responses.
Thinking Transfer eliminates riskTransfer moves the financial consequence to a third party — the risk event can still occur. It's not Avoid.
Confusing FMEA RPN with P&I Matrix scoreFMEA uses 3 factors (including Detection). P&I Matrix uses only 2 (Probability × Impact, no Detection).
Thinking Qualitative is always done before QuantitativeTrue in practice — qualitative is used to screen/prioritize. But both can be used on different risks depending on data availability.
Passive vs. Active AcceptancePassive = no plan, just deal with it if it happens. Active = create contingency plan in advance for the accepted risk.
Workaround vs. Contingency PlanContingency plan = pre-planned response for an identified risk. Workaround = improvised response for an unexpected/unidentified risk.

Risk Management Tools Summary

ToolStep UsedTypeKey Feature
BrainstormingIdentifyGroup techniqueMost common identification tool
Ishikawa DiagramIdentifyCause & EffectOrganizes causes by category (6M)
SWOT AnalysisIdentifyStrategicCaptures positive risks (Opportunities)
FMEAIdentify / AnalyzeFailure analysisRPN = Severity × Occurrence × Detection
P&I MatrixAnalyze (Qualitative)Risk prioritizationRisk Score = Probability × Impact; color-coded zones
EMV AnalysisAnalyze (Quantitative)Financial analysisEMV = P(%) × Impact($); sum across all risks
Monte CarloAnalyze (Quantitative)SimulationProbability distribution of project outcomes
Decision TreeAnalyze (Quantitative)Decision analysisVisual branching of decisions and outcomes
Risk RegisterAll stepsLiving documentCentral repository for all risk information
About This Reference

About the Author

Mahesh Babu Nelakurthi 🔗 LinkedIn

Mahesh Babu Nelakurthi

Sr. Quality & Reliability Engineer · Ultium Cells LLC · Ohio

I currently work as a Senior Quality and Reliability Engineer at Ultium Cells LLC — a GM and LG joint venture at the forefront of America's push toward electric mobility. In advanced, high-volume manufacturing, the stakes of getting quality right are real and immediate. That environment teaches you quickly that the most dependable approach is to think in first principles: go back to what is actually known, build your reasoning from there, and trust the data to show you where the process is telling the truth.

That experience also reinforced a conviction I have held for a long time: that the fundamentals matter most. Not because advanced methods are unimportant — but because the right basic question, asked precisely, almost always points to the answer. Quality Datalabs is built around that idea: a resource grounded in first principles, free to use, and written for engineers who want to understand the why behind every decision.

Credentials & Education
🎓 American Society for Quality (ASQ) Certified Six Sigma Black Belt — CSSBB (Exp. 2029)
🎓 American Society for Quality (ASQ) Certified Quality Engineer — CQE (Exp. 2026)
📘 Master of Science in Industrial Management — Texas A&M University, Kingsville (2017)
📘 Bachelor of Technology in Chemical Engineering — Vignan's Foundation for Science, Technology & Research (2014)
Why This Exists
"We stand on the shoulders of giants. Deming, Juran, Shewhart, Taguchi, Ishikawa — they spent lifetimes building the foundations. That knowledge belongs to all of us."

Quality engineering knowledge has too often been locked behind expensive certifications, paywalled journals, and five-day seminars. This reference was built to change that — to make the full depth of quality engineering accessible to every engineer, at every level of their career.

Every Cpk we compute, every control chart we plot, every FMEA we run — these are acts of responsibility. Somewhere at the end of the supply chain is a person who will use what we make. They trust us, without knowing us, to have done the work properly.

As we stand on the shoulders of giants, we have a responsibility to be better, to strive continuously for quality products reaching the customer. That responsibility is not a burden — it is the privilege of the profession.

What This Reference Covers
📊 Six Sigma, DPMO & DMAIC
🔬 MSA — AIAG 4th Edition
📈 SPC & Process Capability
⚙️ Reliability Engineering
📐 39 Statistical Distributions
🎖️ MIL-STD-1916 & Sampling
📋 FMEA — AIAG-VDA 2019
🎓 Quality Philosophy
🏭 Supplier Quality & PPAP/APQP
🧮 Live DPMO & RPN Calculator

Have a suggestion, found an error, or want to contribute? Reach out — this reference grows through the community it serves.

🔗 Connect on LinkedIn

📬 Send an Enquiry

Found an error, have a question about the content, or want to suggest a new topic? Fill in the form — I read every submission and will get back to you directly.

* Required fields. Your email is used only to reply to your enquiry and is never shared.

Live Calculator

Live Calculator

24 interactive calculators covering Six Sigma, Probability, Reliability, GR&R, DOE, SPC, and Sampling. Enter values — results update instantly. No data leaves your browser.

Six Sigma & Process Capability

Convert between sigma levels, DPMO, and capability indices. Enter any combination — all results update instantly.

📐

DPMO ↔ Sigma Level

Convert between defects per million opportunities and sigma level

σ = NORM.INV(1 − DPMO/1,000,000) + 1.5
Enter either value — the other is computed
— or enter —
Results
DPMO
Sigma (LT)
Sigma (ST)
Yield %
📏

Cp / Cpk / Ppk Calculator

Process capability from specification limits and process statistics

Cp = (USL−LSL)/(6σ) · Cpk = min[(USL−μ),(μ−LSL)]/(3σ)
Results
Cp
Cpk
Cpl
Cpu
DPMO (est.)
Sigma Level
📉

Z-Score ↔ Probability

Standard normal conversions — enter Z or probability

P(X < x) = Φ(z) = Φ((x−μ)/σ)
Results
P(X < z)
P(X > z)
P(|X| < z)
Z-Score
🎯

Sample Size for Capability Study

Minimum n to estimate Cpk with specified confidence

n ≥ 0.5 × χ²(α,2) / (Cpk_target × d²)
Results
Min. Sample Size n
Cpk Lower Bound
Recommendation

Probability

Classical probability rules, conditional probability, Bayes, and common distributions. Enter values and see step-by-step working.

🎲

Basic Probability Rules

Union, intersection, conditional — enter P(A) and P(B)

P(A∪B) = P(A)+P(B)−P(A∩B) · P(B|A) = P(A∩B)/P(A)
Results
P(A∪B)
A or B
P(A∩B)
A and B
P(A|B)
A given B
P(B|A)
B given A
P(A′)
Not A
P(B′)
Not B
🔮

Bayes' Theorem

Update probability given new evidence — posterior from prior

P(A|B) = P(B|A)·P(A) / P(B)
Results
P(A|B) — Posterior
P(B) — Evidence
Odds Ratio
Likelihood Ratio
🎯

Binomial Distribution

Probability of exactly k successes in n independent trials

P(X=k) = C(n,k)·pᵏ·(1−p)ⁿ⁻ᵏ
Results
P(X = k)
P(X ≤ k)
P(X ≥ k)
Mean (np)
Std Dev
🔢

Poisson Distribution

Count events per unit — defects per part, failures per hour

P(X=k) = e⁻λ · λᵏ / k!
Results
P(X = k)
P(X ≤ k)
P(X > k)
Mean = Var

Reliability Engineering

MTBF, MTTR, availability, Weibull B-life, system reliability, and stress-strength interference — with live results.

⏱️

MTBF / MTTR / Availability

Core reliability metrics from failure and repair data

MTBF = Total Time / Failures · A = MTBF/(MTBF+MTTR)
Results
MTBF (hr)
MTTR (hr)
Availability A
Failure Rate λ
FIT Rate
per 10⁹ hr
Downtime %
📈

Weibull B-Life & R(t)

Survival probability and B-life for any Weibull distribution

R(t) = exp[−(t/η)^β] · B_x = η·[−ln(1−x/100)]^(1/β)
Results
R(t) — Survival
F(t) — Failed
h(t) — Hazard rate
MTTF
B1 Life
B10 Life
B50 Life
🔗

System Reliability

Series, parallel, or k-out-of-n configurations — up to 5 components

Series: R=∏Rᵢ · Parallel: R=1−∏(1−Rᵢ) · k/n: Σ C(n,j)Rʲ(1−R)ⁿ⁻ʲ
Component Reliabilities (0–1) — leave blank to skip
Results
System R
System F
Components
Configuration

Stress-Strength Interference

Reliability when both stress and strength are random variables

z = (μ_R−μ_S)/√(σ_R²+σ_S²) · Reliability = Φ(z)
Strength Distribution R ~ N(μ_R, σ_R)
Stress Distribution S ~ N(μ_S, σ_S)
Results
Reliability index z
Reliability R
P(Failure)
Safety Factor
μ_R/μ_S
Failures/Million

GR&R / Measurement System Analysis

Gauge Repeatability & Reproducibility — enter variance components to get %GR&R, ndc, and AIAG acceptance guidance.

🔬

%GR&R from Variance Components

AIAG MSA 4th Ed. — enter EV, AV, PV standard deviations

GRR = √(EV²+AV²) · %GRR = 100×GRR/(TV) · ndc = 1.41×PV/GRR
Results — AIAG Criteria
GRR σ
TV σ
%GR&R
%EV
%AV
%PV
ndc
≥5 required
Decision
AIAG MSA criteria: %GR&R <10% = Acceptable · 10–30% = Conditional · >30% = Unacceptable. ndc ≥ 5 required for the gauge to distinguish parts.
📐

%GR&R — Range Method (Quick)

From operator averages and range averages — AIAG short form

EV = R̄/d₂ · AV = √[((x̄diff/d₂*)²−(EV²/nr)]
Results
EV (Repeatability σ)
AV (Reproducibility σ)
GRR σ
%GR&R
ndc
Decision

Design of Experiments

Number of runs, resolution, and design properties for full factorial, fractional factorial, Plackett-Burman, and Taguchi designs.

🧪

Experiment Run Calculator

How many runs for your design type? Enter factors and levels.

Full: 2ᵏ · Fractional: 2ᵏ⁻ᵖ · PB: next multiple of 4 > k · Taguchi: Lₙ
Results
Runs Required
Resolution
Main Effects
2-way Interactions
Design
DF (error)
📊

Main Effect & S/N Ratio

Factor effect magnitude and Taguchi Signal-to-Noise ratio

ME = Ȳ(+1) − Ȳ(−1) · S/N = −10·log₁₀(Σy²/n) [STB]
Main Effect Calculator
S/N Ratio (up to 5 replicates)
Results
Main Effect
% Change
S/N Ratio (dB)
Mean
Std Dev

Statistical Process Control

Control limits for variables and attribute charts — enter your process data to get UCL, LCL, and center line instantly.

📊

Control Limits Calculator

X̄-R, X̄-s, p, np, c, u charts — select type and enter data

UCL = CL + 3σ · X̄-R: UCL_R = D₄R̄ · UCL_X̄ = X̄̄ + A₂R̄
Control Limits
UCL (main)
Center Line
LCL (main)
UCL (range/s)
CL (range/s)
Process σ̂
🎯

Capability from Control Chart

Estimate Cp, Cpk from R̄ or s̄ without raw data

σ̂ = R̄/d₂ (or s̄/c₄) · Cp = (USL−LSL)/(6σ̂) · Cpk = min(Cpu,Cpl)
Capability Indices
σ̂ (from R̄/d₂)
Cp
Cpk
Cpu
Cpl
DPMO (est.)

Sampling & Confidence Intervals

AQL sampling plans, confidence intervals for means and proportions, and reliability demonstration sample sizes.

📦

AQL Sample Size — ANSI Z1.4

Lot-based acceptance sampling — single sampling plan

n and c from Z1.4 table · P(accept) = Σⱼ₌₀ᶜ C(n,j)·pʲ·(1−p)ⁿ⁻ʲ
Z1.4 Single Sampling Plan
Code Letter
Sample Size n
Accept ≤ c
Reject ≥ r
% Inspected
📏

Confidence Intervals

For mean (t-interval) and proportion (Wilson score)

CI_μ = x̄ ± t(α/2,n−1)·s/√n · CI_p = Wilson score interval
Results
Lower Bound
Point Estimate
Upper Bound
Margin of Error
t / z critical

Reliability Demonstration

Zero-failure test: sample size to prove R* at confidence C

n = ln(1−C)/ln(R*) · MTBF_lower = −2T_total/ln(α)
Results
Sample Size n
MTBF Lower
Conclusion