Quality Engineering Reference Hub

We stand on the
shoulders
of giants.
Now it is
our turn.

"Every Cpk we compute, every control chart we plot, every FMEA we run — these are not bureaucratic checkboxes. They are acts of responsibility. Somewhere at the end of our supply chain is a person who will use what we make. They trust us, without knowing us, to have done the work properly."

— Quality Datalabs

Deming, Juran, Shewhart, Taguchi, Ishikawa — they spent lifetimes building the statistical and philosophical foundations of quality. Their tools are not old. They are permanent. Ours to use, teach, and pass forward.

This reference was built because quality knowledge should be accessible, precise, and free — not locked behind expensive textbooks or five-day seminars. Whether you are running a PFMEA at midnight, explaining Ppk to your manager, or diving deep into reliability analysis, this is for you.

15+ Modules

SSBB Level depth

41 Distributions

Free Always

Jump to

Start Here

Choose your path

A calmer entry point for first-time visitors and fast repeat use.

Quick Access

Use it like a product

Faster jumps for repeat visitors.

Explore All Modules

Jump to any topic

15 modules · Updated continuously

Statistics 01

Six Sigma & DPMO

Normal distribution, Z-values, DPMO formulas, the 1.5σ shift convention, battery anode moisture worked example, and Monte Carlo simulation. The statistical core of Six Sigma methodology.

Bell curve SVGs 10 tabs DPMO reference table Monte Carlo

10 tabs

Explore module →

AIAG 4th Ed. 02

Measurement System Analysis

S.W.I.P.E. error model, stability, bias & linearity, GR&R via X̄-R & ANOVA, torque wrench drift case study.

11 tabsGRR acceptance zones

12 tabs

Explore →

Management New 03

Quality Philosophy

Deming, Juran, Crosby, Ishikawa. PDCA, DMAIC, Lean frameworks, strategic planning, facilitation tools.

8 tabsPioneer comparison

10 tabs

Explore →

Quality Systems 04

Quality Systems

ISO 9001 → IATF 16949 maturity, PPAP levels, special characteristics, 8D problem solving, escalation models, customer-specific requirements.

12 tabsQMS ladder SVG

13 tabs

Explore →

Process Control 05

Statistical Process Control

Cp/Cpk/Ppk, chart selector decision tree, annotated out-of-control patterns, Western Electric rules.

7 tabsDecision tree SVG

8 tabs

Explore →

Interactive Live

DPMO & Capability Calculator

Enter LSL, USL, mean, sigma. Instantly compute DPMO, sigma level, Cpk, Cp, defect probability.

11 calculatorsLive results

Open calculator →

Reliability 07

Reliability Engineering

MTBF/MTTR/Availability formulas, full bathtub curve SVG, Weibull β shape cards, series & parallel systems.

Weibull analysisBathtub curve

13 tabs

Explore →

Distributions New 08

Statistical Distributions

Normal, Weibull, Exponential, Lognormal, Binomial, Chi-square, Poisson, t, F — formulas, properties, applications.

39 distributionsSelector guide

4 tabs

Explore →

Defense New 09

Military & Defense Standards

MIL-STD-1629A FMECA, MIL-HDBK-217F reliability prediction, ANSI Z1.4 sampling, AS9100D, AQAP-2110.

12 standardsZ1.4 AQL table

8 tabs

Explore →

Statistics New 10

Applied Statistics

Hypothesis testing, confidence intervals, regression, ANOVA, chi-square — with quality engineering examples.

Hypothesis testingANOVA

6 tabs

Explore →

AIAG-VDA 2019 11

FMEA & RPN

DFMEA and PFMEA structure, S/O/D scales, live RPN calculator, action priority matrix.

7 tabsLive RPN calc

10 tabs

Explore →

Risk New 12

Risk Management

ISO 31000 framework, risk matrix construction, bow-tie diagrams, failure mode prioritization.

ISO 31000Bow-tie diagrams

8 tabs

Explore →

Experimentation New 13

Design of Experiments

Full factorial, fractional factorial, Taguchi orthogonal arrays, main effects, interaction plots, ANOVA.

OA selectorInteraction plots

11 tabs

Explore →

DFSS/SE New 14

Design for Six Sigma

DMADV roadmap, VOC to CTQ, concept selection, DOE optimisation, tolerance design, full worked example — from brief to production.

11 tabsFull project walkthrough

11 tabs

Explore →

Core Philosophy

Quality is a responsibility, not a checkbox.

As we stand on the shoulders of giants, we have a responsibility to be better — to strive continuously for quality products reaching the customer. Every engineer carries the trust of the end user, someone they will never meet, who relies on the work being done properly.

Good engineering is not sufficient if competitors select their designs from better alternatives. The goal is to empower every quality engineer to make better decisions, ship better products, and uphold the responsibility we carry — to the customer, to the craft, and to those who came before us.

Statistics & Process Quality

Six Sigma & DPMO

From normal distribution tails to defect probability — how sigma level, specification limits, and the 1.5σ long-term convention translate into real manufacturing quality targets.

Six Sigma Metrics Toolkit — DPU, DPO, DPMO, Yield & RTY

Before you can improve a process, you must be able to measure it precisely. Six Sigma uses a tightly connected family of metrics that scale from a single unit all the way to a million-opportunity benchmark. This tab gives you every formula, example, and visual you need.

📊 Six Sigma Metrics — How They Connect

① DPU — Defects Per Unit

DPU = Total Defects ÷ Total Units

The simplest defect metric — average number of defects found on each unit regardless of how many opportunities for failure each unit had. DPU of 0.15 means roughly 1 defect per 7 units.

Example: 75 defects found across 500 units → DPU = 75/500 = 0.15
Limitation: DPU ignores complexity. A complex PCB and a simple bracket both become "one unit." Use DPO for cross-process comparison.

② DPO — Defects Per Opportunity

DPO = Total Defects ÷ (Total Units × Opportunities per Unit)

Normalises the defect rate by the number of distinct ways a unit can fail. Enables fair comparison between processes of different complexity. An "opportunity" is any characteristic that could be measured and found defective.

Example: 75 defects, 500 units, 4 opportunities each → DPO = 75/(500×4) = 0.0375
Defining opportunities consistently is critical — too many opportunities dilutes DPO; too few inflates it.

③ DPMO — Defects Per Million Opportunities

DPMO = DPO × 1,000,000

Scales DPO to a per-million basis, making tiny defect rates intuitive and industry-comparable. The Six Sigma world-class target is 3.4 DPMO — accounting for the 1.5σ long-term drift of a real process.

Example: DPO 0.0375 → DPMO = 0.0375 × 1,000,000 = 37,500 DPMO → approximately 3.3σ process

2σ

308,537

3σ

66,807

4σ

6,210

5σ

233

6σ

3.4 ★

④ FPY — First Pass Yield

FPY = Good Units ÷ Total Units = 1 − DPO

The percentage of units that complete a process step without any rework, repair, or scrap. FPY declining is often the first visible signal that hidden rework costs are accumulating. A plant can show high throughput but terrible FPY if rework is baked into the process.

Example: 460 good units from 500 → FPY = 460/500 = 92% (8% hidden rework cost)

⑤ RTY — Rolled Throughput Yield

RTY = FPY₁ × FPY₂ × FPY₃ × … × FPYₙ

RTY multiplies yields across all process steps. Even individually high-yield steps compound to a much lower overall throughput. This is the metric that exposes the true cumulative cost of a multi-step process and shows why Six Sigma targets perfection at each step.

📊 RTY Compounding — 3-Step Process

Three steps each at ≥95% FPY combine to only 90.2% RTY. Nearly 1 in 10 units has a defect somewhere in the process. RTY forces the question: where is the quality loss occurring?

⑥ Converting DPMO to Sigma Level (Z)

Z_ST = Φ⁻¹(1 − DPO)  |  Z_LT = Z_ST + 1.5 (after shift)

Sigma level (Z) is derived from DPO using the inverse normal CDF. Short-term Z always looks better — the 1.5σ shift accounts for long-term process drift. A process measuring 4.5σ short-term is considered "6 Sigma" because the shift brings it to 3.0σ long-term — but Six Sigma convention adds 1.5 to both sides.

# DPMO → Z_LT | Formula: Z_LT = Φ⁻¹(1 − DPMO ÷ 1,000,000) # Z_ST = Z_LT + 1.5σ (short-term is always 1.5σ better) DPMO = 317,311 → Z_LT ≈ 1.00σ DPMO = 45,500 → Z_LT ≈ 2.00σ DPMO = 37,500 → Z_LT ≈ 1.78σ ← our worked example DPMO = 2,700 → Z_LT ≈ 3.00σ DPMO = 233 → Z_LT ≈ 3.50σ DPMO = 63.3 → Z_LT ≈ 4.00σ DPMO = 3.4 → Z_LT ≈ 4.50σ ← "6 Sigma" long-term (with 1.5σ shift) DPMO = 0.002 → Z_LT = 6.00σ ← true 6σ short-term, centred

⑦ Quick Reference — Diagnostic Signals

DPU ↑ → DPMO ↑ → Sigma ↓

More defects = lower process capability. Focus improvement on the highest DPMO step first.

Cp High, Cpk Low

Process spread is fine but the mean is off-center. Fix centering before reducing σ.

RTY ↓ → Process Loss

Compounded yield drop exposes hidden rework cost. Drill into which step has the lowest FPY.

FPY ↓ → Rework Accumulating

Units leaving a step with defects silently inflate cost. FPY below 95% warrants immediate DMAIC attention.

How the Normal Distribution Creates DPMO

Every manufacturing process produces outputs that vary. When plotted, most processes follow a normal distribution — a symmetric bell curve where values cluster near the mean (µ) and tail off toward the extremes.

The specification limits define the acceptable range. Any output beyond LSL or USL is a defect. DPMO = the area of both red tails × 1,000,000.

📊 Anatomy of the Normal Distribution

Step-by-Step: µ and σ → DPMO

1
Standardize to Z
Z = (X − µ)/σ — converts any measurement to "how many standard deviations from the mean?" Z ~ N(0,1).
2
Find Z at each spec limit
Z_USL = (USL−µ)/σ Z_LSL = (µ−LSL)/σ. Distance from mean to each spec in σ units.
3
Compute both tail areas
p = [1−Φ(Z_USL)] + [1−Φ(Z_LSL)] — the red shaded areas on both sides of the curve.
4
Scale to DPMO
DPMO = p × 1,000,000. Each additional σ level reduces DPMO by ~100×–1000×.

How the Metrics Connect — Follow the Flow

Spec Limits

USL, LSL

Engineering requirement

Process Centre

µ (mean)

Where process aims

Process Spread

σ (std dev)

How much it varies

↓

Z Upper

ZUSL = (USL−µ)/σ

Distance to upper spec in σ units

BOTH
TAILS

Z Lower

ZLSL = (µ−LSL)/σ

Distance to lower spec in σ units

↓

Tail Probability (defect fraction)

p = [1−Φ(ZUSL)] + [1−Φ(ZLSL)]

The combined red shaded area under both tails of the bell curve

↓

DPMO

p × 1,000,000

Defects per million opportunities

↔

Sigma Level

Z = Φ⁻¹(1−p) + 1.5

Long-term sigma (with 1.5σ shift)

↓

Cp — Potential

(USL−LSL)/(6σ)

Can the process fit?

Cpk — Actual

min(Cpu, Cpl)

Is it centred too?

Ppk — Long-term

min(Ppu, Ppl)

With all sources of variation

💡

DPMO is per opportunity. If one unit has 5 weld joints and each is one "opportunity," unit defect rate ≠ DPMO. Always define what "one opportunity" means before comparing across processes.

🔑 Key Definitions

DPMO
Defects Per Million Opportunities — normalizes defect rates for fair comparison across different process complexities.
Φ(z)
Standard normal CDF — cumulative area under the bell curve to the left of z. Tail = 1 − Φ(z).
Sigma Level (Z)
Distance from process mean to nearest spec in standard deviations. Higher = better quality.
True 6σ Centered
Two-sided DPMO ≈ 0.002. Roughly 1 defect per 507 million opportunities.

Plastic Housing Wall Thickness: 2.450 – 2.550 mm

A precision injection-moulded housing for an electronic sensor. The design team has set a tight wall-thickness specification to ensure structural integrity and correct fit. LSL = 2.450 mm, USL = 2.550 mm — a bilateral tolerance of ±0.050 mm. Your task: determine whether the current process is capable, and what happens when it drifts.

The Scenario

Production data from 200 parts shows: µ = 2.500 mm (centred), σ = 0.00833 mm. A process audit later reveals mean drift to µ = 2.5125 mm — a +1.5σ shift typical of long-term process behaviour.

Step A — Compute σ Required for True 6σ

Tolerance → σ relationship

d = (2.550 − 2.450) / 2

= 0.050 mm (half-tolerance)

σ = d / Ztarget = 0.050 / 6

= 0.00833 mm required for true 6σ

Step B — Centred Process (Short-term, µ = 2.500 mm)

Z-score calculation — both spec limits

      ZUSL = (2.550 − 2.500) / 0.00833

             = 0.050 / 0.00833

             = 6.000

      ZLSL = (2.500 − 2.450) / 0.00833

             = 0.050 / 0.00833

             = 6.000

    p = 2 × [1 − Φ(6.000)] = 2 × 9.866×10⁻¹⁰

    DPMO = 0.002  |  Sigma level = 6.0σ (ST)

Step C — After +1.5σ Drift (µ = 2.5125 mm)

Mean has drifted — Z-scores are now asymmetric

      ZUSL = (2.550 − 2.5125) / 0.00833

             = 0.0375 / 0.00833

             = 4.500

      ZLSL = (2.5125 − 2.450) / 0.00833

             = 0.0625 / 0.00833

             = 7.500

    p = [1−Φ(4.5)] + [1−Φ(7.5)] ≈ 3.398×10⁻⁶ + ~0

    DPMO = 3.4  |  Sigma level = 4.5σ (LT)

Step D — Capability Indices

Cp — Potential

2.000

(USL−LSL)/(6σ) = 0.1/0.05

Cpk — Short-term

2.000

Centred → Cp = Cpk

Cpk — After Drift

1.500

min(4.5/3, 7.5/3) = 1.5

📊 Centred 6σ vs +1.5σ Shifted Process (2.450–2.550 mm spec)

⚠️

Even a well-designed 6σ process accumulates drift over time. This is why Six Sigma reports two separate numbers: short-term Cp/Cpk (from a tightly controlled study) and long-term Ppk (from production data including all sources of variation). Always specify which you are reporting.

📋 Process Summary

Parameter	Value
Feature	Wall thickness
LSL	2.450 mm
USL	2.550 mm
µ (centred)	2.500 mm
σ (at 6σ)	0.00833 mm
Cp	2.000
Cpk (centred)	2.000
DPMO (centred)	0.002
µ after +1.5σ drift	2.5125 mm
Z_USL (drifted)	4.500
Cpk (drifted)	1.500
DPMO (drifted)	3.4

🔑 What This Tells You

Cp = 2.0 — the tolerance window is twice what the process spread needs. Excellent potential.
Cpk = 2.0 (centred) — the process is hitting its potential. World class.
Cpk = 1.5 (drifted) — still very capable, but DPMO jumped from ~0 to 3.4.
This is why control charts matter — to catch drift before it escalates.

The 1.5σ Shift — Why "3.4 DPMO at 6σ"?

The famous 3.4 DPMO figure comes from a single assumption: real-world processes drift by approximately 1.5σ over the long term due to tool wear, raw material shifts, and environmental changes.

📊 Short-term 6σ becomes Long-term 4.5σ to the Nearest Spec

With 1.5σ Shift Applied — µ moves from 250 → 312.5 ppm

                ZUSL = (500 − 312.5) / 41.667 = 4.500

                ZLSL = (312.5 − 0) / 41.667 = 7.500

                pUSL ≈ 3.398×10⁻⁶  (dominates)

                pLSL ≈ 3.186×10⁻¹⁴  (negligible)

                DPMO ≈ 3.4

Cp vs Cpk — The Critical Distinction

🎯

(USL−LSL) / 6σ

Process potential. Ignores mean position. "Could it fit if centered?"

📍

Cpk

min[(USL−µ)/3σ, (µ−LSL)/3σ]

Actual capability. Accounts for mean position. Cpk ≤ Cp always.

📅

Ppk

uses σ_overall (incl. drift)

Long-term performance. Includes all variation sources including drift.

💡

Rule: Large Cp−Cpk gap = process is capable but off-center. Fix centering first before trying to reduce σ. If Cp ≥ 1.33 but Cpk < 1.33, the problem is mean position, not spread.

⚖️ ST vs LT Sigma

ST Z	LT Z	LT DPMO
3σ	1.5σ	66,807
4σ	2.5σ	6,210
5σ	3.5σ	233
6σ	4.5σ	3.4

Sigma Level ↔ DPMO Reference

The sigma-DPMO relationship is exponential — each additional sigma level cuts DPMO by one to three orders of magnitude. The visual below makes this concrete.

📊 DPMO at Each Sigma Level — Relative Scale (log-mapped to bar width)

1σ

317,311 68.3% yield

2σ

45,500 95.5% yield

3σ

2,700 99.73% yield

4σ

6,210 99.379% LT

4.5σ ★

3.4 99.99966% LT

5σ

0.573 99.99994%

6σ ★

0.00197 99.99966%

Sigma (Z)	1-sided DPMO	2-sided DPMO	LT DPMO (+1.5σ)	Defect %	Yield %
1σ	158,655	317,311	697,672	31.73%	68.27%
2σ	22,750	45,500	308,537	4.55%	95.45%
3σ	1,350	2,700	66,807	0.270%	99.73%
4σ	31.67	63.34	6,210	0.0063%	99.9937%
5σ	0.287	0.573	233	0.000057%	99.99994%
6σ	0.000987	0.00197	3.4	3.4×10⁻⁷%	99.99966%
7σ	1.28e-7	2.56e-7	0.019	~0	~100%

Monte Carlo Simulation

Monte Carlo generates thousands of random N(µ,σ) samples and counts how often they fall outside spec limits. It validates analytical DPMO and teaches tail probability concepts visually — especially useful for non-normal processes.

import numpy as np # ── Specification limits ── LSL, USL = 0, 500 # ── Process parameters ── mu = 250 # mean (centred) sigma = 41.667 # σ = 250/6 for 6σ N = 1_000_000 # simulation size # ── Centred process ── x = np.random.normal(mu, sigma, N) defects = np.sum((x < LSL) | (x > USL)) dpmo = defects / N * 1_000_000 # ── +1.5σ shifted process ── x2 = np.random.normal(mu + 1.5*sigma, sigma, N) dpmo_shifted = np.sum((x2 < LSL) | (x2 > USL)) / N * 1e6

Simulation Results (N = 400,000)

Case	µ	Defects (N=400K)	Est. DPMO	Analytical
Centered 6σ	250	0	0.000	0.00197
+1.5σ Shifted	312.5	2	5.000	3.398

📖

Zero defects in 400,000 samples at true 6σ is correct, not a bug. You'd need ~500 million samples to reliably observe a single 6σ defect. For extreme sigma levels, analytical methods are far more practical than simulation.

When Simulation Beats Analytical Methods

When the process distribution is non-normal (skewed, bimodal, truncated)
When multiple interacting dimensions or GD&T stackups are involved
When teaching the effect of mean shift, σ reduction, or spec change visually

🎲 Required Sample Size

Sigma	Need N ≥
3σ	37,000
4σ	1.6M
5σ	175M
6σ	507M

Rule of thumb: N ≥ 10/p for reliable estimation. Use analytical at 5σ+.

DMAIC — The Five-Phase Process Improvement Roadmap

DMAIC is the backbone of every Six Sigma project. It takes a problem through five sequential phases — each with specific tools and deliverables — to arrive at a sustainable solution that eliminates root cause rather than treating symptoms.

📊 DMAIC Process Flow — Problem to Sustainable Solution

D — DEFINE

Link the problem to organisational priorities and secure management commitment

Starts with COPQ/Pareto analysis to identify and prioritise the problem. SIPOC diagram scopes the project boundaries (7–8 key process steps). Ends with a signed charter containing problem statement, goal, scope, estimated savings, team, and timeline.

VOC / CTQ Tree

SIPOC Diagram

Project Charter

M — MEASURE

Establish the current baseline and validate the measurement system

A Y=F(X) process map identifies all inputs and outputs. FMEA quantifies risk by RPN. Gage R&R validates measurement before collecting capability data. The phase ends with a confirmed baseline sigma level (Cpk) and an accepted measurement system.

Y=F(X) Map

FMEA / RPN

GR&R / MSA

Process Sigma

A — ANALYZE

Identify and validate root causes with data — not opinions

Hypothesis tests (t-test, ANOVA) compare means between conditions. Correlation and regression reveal input-output relationships. 5-Whys and Ishikawa structure the cause-and-effect thinking. The phase ends with statistically validated root causes.

Hypothesis Test

ANOVA

Regression

5-Whys / Ishikawa

I — IMPROVE

Develop, test and implement solutions that address root causes

Design of Experiments (DOE) maps the relationship between input factors and output responses, finding optimal operating conditions. Solutions are piloted before full rollout. The Improve phase ends with a statistically significant improvement in the baseline metric.

DOE / RSM

Pugh Matrix

Piloting

Poka-Yoke

C — CONTROL

Sustain the gains and prevent regression to the old process

SPC charts monitor the improved process in real-time. A Control Plan documents what to measure, how often, and what action to take on signals. Updated FMEA, process maps, and SOPs transfer ownership back to the process team. Project savings are calculated and reported.

SPC Charts

Control Plan

Updated FMEA

Final Report

💡

DMAIC is not always needed. If a problem already has a known solution and action plan, it is an implementation project — just execute the plan. DMAIC is reserved for problems where the root cause is genuinely unknown.

Splitting the DMAIC — Four Focused Paths to Improvement

Full DMAIC training covers dozens of tools across five phases. Research into successful projects shows that four common paths account for the vast majority of real improvements. Each path has a clear objective, a targeted tool set, and a repeatable sequence. Matching the right path to the right problem dramatically increases success rate.

📊 Four DMAIC Paths — Match Your Problem to Its Path

PATH 1 — Reduce Variability

Goal: Achieve stable, predictable, capable output (Cpk ≥ 1.33)

This is the heart of classic Six Sigma — SPC was its original tool. Key insight: don't start with the control chart. First validate the measurement system, then characterise the process, then chart it. Starting with charts on an unvalidated measurement system is a very common and costly mistake.

Process Map → I/O Matrix → Specs / Targets → SOPs → MSA (R&R + Stability + Linearity) → Potential Study (Cpk) → Control Chart → Eliminate Special Causes → Reduce Common Causes → Capture & Standardise

PATH 2 — Reduce Failures

Goal: Increase machine/process uptime and throughput

Targets machine breakdowns and availability losses. Asset Utilization (AU) waterfall charts identify the top loss categories. Component matrices link failure modes to parts. Weibull analysis predicts failure timing and drives condition-based maintenance strategy.

Define Target Process → AU Loss Waterfall → Component Matrix → Failure Modes → Maintenance Strategy → Autonomous Maintenance → Growth Tracking → Capture

PATH 3 — Reduce Waste (Lean)

Goal: Eliminate the 8 wastes — TIMWOOD + Skills

Value Stream Mapping reveals waste across the flow. 5S eliminates inventory and motion waste. Kanban controls overproduction. QFD aligns specs to customer need — often revealing specs that are unnecessarily tight (over-processing waste). The Lean path is the newest and most popular, but its sand pits can trap unwary teams if used for the wrong problem type.

Transport

Inventory

Motion

Waiting

Overproduction

Over-processing

Defects

+ Skills

PATH 4 — Reduce Defects

Goal: Drive defect frequency to zero

Defects are things that shouldn't be there at all — unlike variability, there is no optimal level other than zero. This is the widest, most common DMAIC path. Its tools are simple and accessible to anyone at any belt level: Pareto to prioritise, Fishbone/5-Whys to find cause, Poka-Yoke to prevent recurrence, and standardisation to sustain.

Define Defect → Measure Frequency (Pareto) → Flow Diagram → Root Cause (5-Why + Fishbone) → Solution Matrix → Pilot → Full Implementation → Standardise → Verify Results

⚠️

Path selection principle (Quick, 2019): Management sets the goal and links it to KPIs. Teams never choose their own projects — projects without management linkage lose resources to crises. The need should drive the method, just as form follows function.

COPQ & Project Selection — Linking Six Sigma to Business Results

Every Six Sigma project must be tied to real business cost — otherwise it competes with day-to-day operations and loses. The Cost of Poor Quality (COPQ) framework ensures projects are prioritised by financial impact, not by seniority or gut feeling.

📊 Cost of Quality — The Four Buckets

Six Sigma projects must be identified from internal failure and external failure categories first — these directly impact bottom-line results. Prevention spending typically returns 3–5× its cost by reducing the failure categories. "Gating the defect" — catching quality issues in-house before they reach the customer — is a fundamental discipline.

Multi-Level Pareto — Drilling to Project Scope

A single Pareto identifies the biggest problem category. A second-level Pareto drills into that category. If a problem appears in the top 3 at both levels — by frequency and cost — it is the ideal project candidate.

📊 Two-Level Pareto — From Symptom to Project

Project Charter — The Contract Between Team and Management

Required Charter Elements

✓ Problem statement (what, where, when, magnitude)
✓ Measurable goal with deadline
✓ Scope — start & end point, in/out of scope
✓ Team members & roles
✓ Estimated savings from COPQ analysis
✓ Timeline with phase gate milestones
✓ Management signature (resource commitment)

Common Project Selection Failures

✗ Choosing the hardest problem (years-old issue)
✗ Selecting an already-approved capital project
✗ No link to financial impact or KPIs
✗ Team members choose their own projects
✗ Scope too broad — "reduce all defects"
✗ No management sign-off or resource commitment
✗ Renaming existing firefighting as a DMAIC project

✅

Selection rule (Shankar, ASQ 2009): Start from external failure costs, then internal failure costs. Problems with the highest combined frequency and cost across multiple Pareto levels are the ideal candidates. The data dictates priority — not management preference or the loudest voice in the room.

AIAG MSA 4th Edition (June 2010)

Measurement System Analysis

Before trusting process data, trust your measurement system. MSA quantifies how much observed variation is process — and how much is just the gauge. Every PPAP, every SPC chart, every capability index depends on getting this right first.

MSA Variation Taxonomy — The Complete Tree

Every measurement you take contains two fundamentally different kinds of variation. Understanding their structure is the foundation of all MSA work. The tree below shows the complete decomposition — from total observed variation down to each individual error source.

🌳 Measurement System Variation — Full Taxonomy Tree

Total Observed Variation σ²_observed = σ²_process + σ²_measurement Process Variation Part-to-Part σ²_p True part-to-part differences Drives SPC & capability indices (the signal we want to see) Measurement Variation Gauge / MSA Error σ²_ms Accuracy Systematic / Trueness Error Precision Random / Scatter Bias Offset from true value Linearity Bias varies by part size Stability Bias drifts over time Repeatability Same operator, same part Reproducibility Between operators GR&R (Gage R&R) Repeatability + Reproducibility LEGEND Total observed variation Process / part variation Measurement variation Accuracy (systematic) Precision / GR&R (random)

Accuracy vs Precision — The Core Distinction

ACCURACY — Systematic Error

How close measurements are to the true reference value. Accuracy errors are consistent — they shift every reading in the same direction. A perfectly precise gauge can still be completely inaccurate.

Three components: Bias · Linearity · Stability

PRECISION — Random Error

How close repeated measurements are to each other. Precision errors are random — they scatter results around some central value. High precision doesn't guarantee accuracy; a precise gauge can be precisely wrong.

Two components: Repeatability · Reproducibility → GR&R

The Five MSA Error Components Explained

① BIAS — Accuracy Component

The systematic offset from true value

Bias is the difference between the observed average measurement and the reference/true value for the same part. A gauge with positive bias reads high consistently; negative bias reads low. It is measured by comparing the gauge average against a known reference standard (master part).

Bias = X̄_observed − Reference_Value  |  %Bias = Bias / Process_Variation × 100

Cause: Worn gauge, incorrect calibration, wrong reference standard, elastic deformation of gauge or part.

② LINEARITY — Accuracy Component

Bias that changes across the measurement range

Linearity asks: "Is bias the same at low values as at high values?" A gauge may read accurately near 5mm but overread near 25mm. Linearity is assessed by measuring multiple reference parts spread across the full operating range and plotting bias vs. reference value. The slope of the regression line is the linearity error.

Linearity = slope × Process_Variation  |  %Linearity = |slope| × PV × 100

Cause: Gauge not calibrated across full range, non-linear amplifier response, mechanical wear concentrated at one end of travel.

③ STABILITY — Accuracy Component

Bias drift over time

Stability (also called drift) measures whether the gauge's accuracy changes over time. A stable gauge produces the same average reading on a reference part measured today, next week, and next month. It is assessed by measuring a master part periodically and charting the averages on an Individuals (XmR) control chart. An out-of-control point signals a stability problem.

Stability = |Bias_time1 − Bias_time2|  |  Monitored via XmR chart on reference part

Cause: Thermal drift, electrical component aging, mechanical wear, contamination, re-calibration interval too long.

④ REPEATABILITY (EV) — Precision Component

Within-operator scatter — Equipment Variation

Repeatability is the variation obtained when one operator measures the same part multiple times under the same conditions. It represents the fundamental noise floor of the instrument — the best the gauge can possibly do. AIAG calls this Equipment Variation (EV). Even with a perfect operator technique, a poor gauge yields high repeatability error.

σ²_repeatability = MS_repeatability (ANOVA)  |  EV = R̄ × K₁ (Avg & Range)

Reduces with: Gauge overhaul, reducing environmental noise, better fixturing, increased resolution. This is the component that can ONLY be improved by instrument upgrade.

⑤ REPRODUCIBILITY (AV) — Precision Component

Between-operator scatter — Appraiser Variation

Reproducibility is the variation in measurement averages obtained by different operators measuring the same part with the same gauge. It captures differences in technique, fixture loading, data reading habits, and environmental sensitivity. AIAG calls this Appraiser Variation (AV). High AV tells you training and procedure standardisation is the priority — not a new gauge.

σ²_reproducibility = (MS_operator − MS_repeatability) / (n×p)  |  GR&R = √(EV² + AV²)

Reduces with: Operator training, written measurement procedures (SOP), better fixtures, fixture gauging to remove human positioning variation.

💡

AIAG Priority Rule: Always resolve accuracy problems (Bias → Linearity → Stability) before running a GR&R study. A biased or drifting gauge will corrupt your GR&R data. Recalibrate first, then study precision.

Three Methods to Quantify GR&R (Precision)

📏

Average & Range

Manual calculation using ranges. Easy but uses std dev not variance — percentages don't add to 100%. Not recommended by AIAG or Wheeler.

📊

ANOVA

Uses variance components. Detects operator×part interaction. AIAG-preferred. % of total variance sums to 100%.

🎯

EMP (Wheeler)

Evaluating the Measurement Process. Uses control charts + intraclass correlation ρ. Classifies gauge as 1st–4th Class Monitor.

See the GR&R — 3 Methods and EMP Method tabs for full worked examples using the AIAG 4th Edition reference dataset (3 operators × 10 parts × 3 trials).

The Fundamental MSA Equation

Every measurement you take is the sum of two things: what the process actually produced, and the noise your gauge added. MSA separates them.

📊 Variance Decomposition — The Core MSA Identity

Total Observed Variance

σ²obs = σ²actual + σ²GRR

%GRR (% of study)

%GRR = 100 × σGRR / σobs

ndc — Distinct Categories

ndc = 1.41 × (σp / σGRR)

Must be ≥ 5 for adequate discrimination

S.W.I.P.E. — The Five Error Sources (AIAG 4th Ed.)

Standard

Reference value, NIST traceability chain, master calibration. An operational definition: same meaning to supplier and customer, yesterday and today.

Workpiece

Part geometry, surface finish, within-part variation. If the wrong variable is measured, no level of precision helps.

Instrument

Gage design, discrimination, maintenance. The 10-to-1 rule: discrimination must be ≤ 1/10 of process variation (not tolerance).

Person

Appraiser technique, training, skill. The recommended study for manual instruments and product/process qualification.

Environment

Temperature, humidity, vibration, cleanliness. Most common study source for highly automated measurement systems.

AIAG Mandatory Sequence — Never Skip

Discrimination — The 10-to-1 Rule

The 4th Edition updated this rule: instrument discrimination must be at most 1/10 of the process variation (σ × 6), not 1/10 of the tolerance. This reflects the philosophy of process-focused quality — the process, not the spec, drives measurement requirements.

ndc	Ability	Use case
1	Go/no-go only	Cannot distinguish values. Control only if large Cp and flat loss function.
2–4	Coarse estimation	Semi-variable control only. Cannot reliably estimate process parameters.
≥ 5	Adequate	Can be used with variables control charts. AIAG minimum requirement.
≥ 10	Excellent	Full analytical resolution. No discrimination concerns.

⚠️

Deming's Funnel / Tampering Warning (AIAG 4th Ed. Ch. I-B): A measurement system with large variation causes operators to adjust processes that don't need adjustment. Autocompensation that adjusts by the last result (Rule 2) adds variation — the exact opposite of its intent. Never adjust a stable process based on a single measurement.

🔑 Key Definitions (AIAG 4th Ed.)

Bias
Difference between observed average and reference value. Systematic error. Assessed by t-test: H₀: bias=0 at α=0.05.
Stability (Drift)
Change in bias over time. Tracked with X̄&R control charts on a reference part. Must be confirmed FIRST.
Linearity
Change in bias over the operating range. Regression: slope=0 (H₀) tested at α=0.05. 5 parts covering full range.
Repeatability (EV)
One appraiser, same part, same gage. Equipment Variation. Within-system error.
Reproducibility (AV)
Different appraisers, same gage, same part. Appraiser Variation. Between-system error.
GR&R
GRR² = EV² + AV². The combined measurement system capability estimate.
Measurement Uncertainty
Different from MSA. MSA = understand sources. Uncertainty = range expected to contain true value. True = Observed ± U.

Bias Study — Independent Sample Method

Tests H₀: bias = 0. The calculated average bias is evaluated to determine if it could be due to random sampling variation — or if there is a true systematic offset that needs recalibration.

Step-by-Step Procedure

1
Establish Reference Value
Send part to metrology lab or measure n≥10 times with higher-order instrument. Average = reference value. Choose a part near mid-range of production variation.
2
Collect Measurements
Measure the same part n≥10 times under normal conditions by the lead operator.
3
Check Repeatability First
%EV = 100[σ_r / TV]. If %EV is large, fix repeatability before continuing — bias test assumes acceptable repeatability.
4
Compute t-statistic
t = bias / σ_b where σ_b = σ_r / √n. Reject H₀ if |t| > t(α/2, n−1). Default α = 0.05.
5
Check CI Contains Zero
bias ± t(0.025, n−1) × σ_b. If zero is within CI → bias is acceptable.

Worked Example — AIAG MSA 4th Ed. (p.90–91)

📋

Reference value = 6.00. n = 15 readings by lead operator. Expected process variation (σ) = 2.5.

📊 AIAG Bias Study Data — 15 Readings (Reference = 6.00)

5.8
−0.2

5.7
−0.3

5.9
−0.1

6.0
0.0

6.1
+0.1

6.0
0.0

6.1
+0.1

6.4
+0.4

6.3
+0.3

6.0
0.0

6.1
+0.1

6.2
+0.2

5.6
−0.4

6.0
0.0

AIAG 4th Ed. Bias Analysis — Table III-B 2

Inputs

Reference = 6.00
x̄ = 5.772 (15 readings)
Bias = 5.772 − 6.00 = −0.228

Significance Test

σrepeatability = 0.2967
t = Bias / (σ/√n) = −2.977
tcrit(14df, α=0.05) = 2.145
|t| > tcrit → Bias is significant

✅

Result from AIAG 4th Ed.: The bias is statistically acceptable. Zero falls within the 95% CI of (−0.1107, +0.1241). The measurement system can proceed to GR&R study.

Common Causes of Non-Zero Bias

Instrument needs calibration (most common)
Worn instrument, equipment, or fixture
Worn or damaged master; error in master
Instrument made to wrong dimension
Instrument measuring the wrong characteristic
Instrument correction algorithm incorrect

📋 Bias Study Summary

Parameter	Value
Reference Value	6.000
X̄ (15 readings)	6.067
Bias	+0.067
σ_r (repeatability)	0.2120
%EV	8.5%
t_stat	1.224
t_critical (α=0.05)	2.145
95% CI lower	−0.1107
95% CI upper	+0.1241
Zero in CI?	YES ✓
Verdict	ACCEPTABLE

Linearity Study

Linearity = bias that changes with the size of the part being measured. A gage may be perfectly accurate at one point in its range and badly biased at another. Tests if the slope of bias vs. reference value equals zero.

How to Conduct (AIAG 4th Ed.)

1
Select 5 Parts Across Full Range
Choose g ≥ 5 parts whose measurements, due to process variation, cover the full operating range of the gage.
2
Establish Reference Values
Have each part measured by layout inspection. Confirm the gage's operating range is fully covered.
3
Measure m ≥ 10 Times Each
One operator, same gage, random order (to prevent recall bias).
4
Regression Analysis
Fit bias = a × reference + b. Test H₀: a=0 (no linearity) AND H₀: b=0 (no constant bias). Both must pass.

Worked Example — AIAG MSA 4th Ed. (Table III-B 4)

Part	Ref. Value	Avg Bias	Verdict
1	2.00	+0.507	Large positive bias
2	4.00	+0.144	Moderate bias
3	6.00	+0.083	Near zero
4	8.00	−0.300	Negative bias
5	10.00	−0.614	Large negative bias

AIAG 4th Ed. Linearity Analysis — Table III-B 5

Regression

Bias = a + b × Ref
b (slope) = −0.1429
a (intercept) = 0.8373

Significance

tslope = −3.116, p < 0.05
→ Linearity is significant
R² = 0.3266

🔴

AIAG conclusion: This measurement system has a linearity problem. The bias starts large and positive at small part sizes and switches to large negative at large sizes. The gage must be recalibrated across its full range before use. Cannot be used for product/process analysis in this condition.

Graphical Pass/Fail Rule

Plot bias vs reference value with best-fit line and confidence bands. For linearity to be acceptable, the "bias = 0" horizontal line must lie entirely within the confidence bands of the fitted regression line. If the zero line exits the bands at any point — linearity problem exists regardless of numerical results.

📊 Linearity vs Bias at a Glance

📌

Constant bias can be corrected by recalibration. A linearity error requires hardware or software modification across the full operating range.

Stability Study — Change in Bias Over Time

A stable gauge gives the same bias today as it did last month. Stability must be confirmed with X̄&R control charts on a reference part before any GR&R study begins — an unstable system produces meaningless GR&R results.

Procedure

1
Select Reference Part
Near mid-range of production variation. Establish reference value from lab/higher-order system. May want masters at low, mid, and high range — separate charts for each.
2
Periodic Measurement
Measure the reference part n=3–5 times per period. Weekly or daily, depending on expected drift rate. Plan ≥20 subgroups before final assessment.
3
X̄&R Control Charts
Plot and analyze. Look for: trends, shifts, out-of-control signals, cycles. No specific %Stability index — analysis is through control chart interpretation.
4
Pass / Fail
Stable = no OOC signals, no trends. Unstable = any OOC signal, trend, or systematic drift. Do not proceed to GR&R until stable.

AIAG Example — Stability Study Data (Figure III-B 1)

From AIAG MSA 4th Ed., Figure III-B 1 — Stability Study

              Reference Value = 6.00

              Control limits: UCLx̄ = 6.11  |  LCLx̄ = 5.72

              UCLR = 0.73  |  LCLR = 0

All points within limits → measurement system stable

Stability vs Other MSA Properties

Property	Varies with	Study design	Chart type	Order
Stability	TIME	Same part, time changes	X̄&R over time	① First
Linearity	RANGE	Different parts, same time	Regression plot	② Second
Bias	—	Same part, single session	Histogram + CI	③ Third
GR&R	—	Multiple parts, appraisers, trials	X̄&R or ANOVA	④ Last

⚠️

No specific %Stability threshold exists in AIAG 4th Ed. The manual explicitly states: "Other than normal control chart analyses, there is no specific numerical analysis or index for stability." Pass/fail is entirely based on control chart interpretation. The torque wrench example from our other module uses a calculated percentage — that is a customer-specific metric, not an AIAG standard.

🔧 Why Stability First?

If the bias is changing over time while you conduct a GR&R study, your results are meaningless. The study will reflect a snapshot of a moving target — not the true long-term measurement system capability.

🚨

GR&R on an unstable system = wasted effort. Calibrate, investigate, and restore stability first.

Possible causes of instability

Wear in measurement equipment
Damaged or worn standard/master
Temperature / humidity cycling
Electronic drift in sensors
Spring fatigue (torque wrenches)
Contamination or lubricant buildup

GR&R Study Methods — X̄-R Method

The X̄-R method (Average and Range) is the automotive industry standard: 3 appraisers × 10 parts × 2–3 trials, randomised order. Cannot detect appraiser-by-part interaction, but well understood and widely accepted for PPAP.

Complete AIAG Example (Table III-B 15/16)

GRR Study Setup — 3 Appraisers × 10 Parts × 3 Trials

Key ANOVA Results

Parts F = 128.93 (p < 0.001)
Appraisers F = 0.424 (p = 0.661)
Interaction F = 0.434 (p = 0.850)

Variance Components

σ²repeatability = 0.04007
σ²reproducibility = 0.00456
σ²GRR = 0.04463
σ²parts = 0.17020

GRR Acceptance Zones

✓ <10%

⚠ 10–30%

✗ >30% Unacceptable

Three Accepted Methods

Uses ranges from pairs of measurements. Provides only combined GRR — cannot separate EV from AV. Not acceptable for PPAP submission. Used for quick initial screening to see if a formal study is warranted.

GRR = R̄ / d₂* where d₂* depends on sample size and number of subgroups.

3 appraisers × 10 parts × 2–3 trials, random order. Uses control chart constants K1, K2, K3 to separate EV and AV. Cannot estimate appraiser-by-part interaction. Most common in PPAP packages.

Most statistically powerful. Handles any experimental setup. Detects appraiser-by-part interaction — a source X̄-R method misses. Decomposes: Parts, Appraisers, Interaction, Equipment. AIAG recommends this method when a computer is available.

What GRR Diagnostics Tell You

Finding	Root Cause	Action
EV large vs AV	Instrument problem	Maintenance, redesign, fix clamping
AV large vs EV	Appraiser technique differs	Retrain, clarify procedure, add fixture
Interaction significant	Appraisers handle parts differently	Standardise measurement procedure
ndc = 1 or 2	Poor discrimination	Upgrade gauge resolution

📋 Study Results (AIAG Example)

Source	StdDev	%TV
EV (Repeat.)	0.202	17.6%
AV (Reprod.)	0.230	20.1%
GRR Total	0.306	26.7%
PV (Parts)	1.104	96.4%
TV	1.086	100%
ndc	≈ 5 (borderline)

⚠️

At 26.7% GRR, this system is in the "may be acceptable" zone. AIAG says decision should be based on application importance and cost.

ANOVA Method — Same Data, Better Results

ANOVA on the same 3×10×3 dataset detects whether appraiser-by-part interaction is significant — something the X̄-R method simply cannot see. When interaction is non-significant, results are pooled into the equipment term.

The ANOVA Table (AIAG Table III-B 7)

Source	DF	SS	MS	F	Significant?
Appraiser	2	3.1673	1.58363	34.44	Yes (α=0.05)
Parts	9	88.3619	9.81799	213.52	Yes (α=0.05)
Appraiser×Part	18	0.3590	0.01994	0.434	NO — pooled
Equipment	60	2.7589	0.04598	—	—
Total	89	94.6471	—	—	—

ANOVA Pooling Decision — Interaction Non-Significant

              Interaction F = 0.434 < Fcritical → pool with Equipment

              Pooled MSequipment = (SSinteraction + SSrepeatability) / (dfint + dfrep)

              = (0.7783 + 7.2129) / (18 + 60) = 0.10247

              σ²repeatability = MSequipment = 0.04007

ANOVA vs X̄-R: Side-by-Side

Method	EV	AV	GRR	%GRR	Interaction
X̄-R Method	0.202	0.230	0.306	26.7%	Cannot detect
ANOVA	0.200	0.227	0.302	27.9%	0 (not significant)

Results are very close — this is expected when interaction is non-significant. ANOVA gives slightly more accurate estimates due to better partitioning. The key ANOVA advantage is detecting the interaction term.

💡

When does interaction matter? If the interaction term were significant (parallel lines on interaction plot = no interaction; crossing lines = interaction), it would indicate different appraisers handle different parts inconsistently — a training or fixture problem specific to certain part geometries.

📊 Graphical Tools — ANOVA

Interaction Plot
Appraiser avg per part vs part number. Parallel lines = no interaction. Crossing lines = interaction present.
Error Charts
Individual deviations from reference. Appraiser A: positive bias. Appraiser C: negative bias (from AIAG example).
Whiskers Chart
High/low/average per part per appraiser. Reveals inconsistent appraisers across different part sizes.
Residual Plot
Fitted vs residual values. Check for randomness — any pattern suggests model inadequacy.

EMP Method — Evaluating the Measurement Process

The EMP methodology, developed by Dr. Donald J. Wheeler, goes beyond a simple pass/fail percentage. It uses control charts to validate the study, computes variance components (not standard deviations), and classifies your measurement system as a First, Second, Third, or Fourth Class Monitor — giving you actionable intelligence about what the gauge can actually do in production.

📖

Source: All three GR&R methods below use the AIAG 4th Edition reference dataset — 3 operators (A, B, C) × 10 parts × 3 trials each = 90 measurements total. This allows direct comparison of methods on identical data.

The AIAG Reference Dataset (Table 1)

Op.	Trial	P1	P2	P3	P4	P5	P6	P7	P8	P9	P10
A	1	0.29	−0.56	1.34	0.47	−0.80	0.02	0.59	−0.31	2.26	−1.36
A	2	0.41	−0.68	1.17	0.50	−0.92	−0.11	0.75	−0.20	1.99	−1.25
A	3	0.64	−0.58	1.27	0.64	−0.84	−0.21	0.66	−0.17	2.01	−1.31
B	1	0.08	−0.47	1.19	0.01	−0.56	−0.20	0.47	−0.63	1.80	−1.68
B	2	0.25	−1.22	0.94	1.03	−1.20	0.22	0.55	0.08	2.12	−1.62
B	3	0.07	−0.68	1.34	0.20	−1.28	0.06	0.83	−0.34	2.19	−1.50
C	1	0.04	−1.38	0.88	0.14	−1.46	−0.29	0.02	−0.46	1.77	−1.49
C	2	−0.11	−1.13	1.09	0.20	−1.07	−0.67	0.01	−0.56	1.45	−1.77
C	3	−0.15	−0.96	0.67	0.11	−1.45	−0.49	0.21	−0.49	1.87	−2.16

EMP Variance Component Formulas

Like ANOVA, EMP works in variances (not standard deviations). Subgroups are each operator×part combination (e.g., A-Part1 = {0.29, 0.41, 0.64}). The average range R̄ drives all calculations.

X̄-R Method — Step by Step

Step 1 — Repeatability (EV)

EV = R̄ × K₁
R̄ = 0.4267 (avg range)
K₁ = 0.5908 (3 trials)
EV = 0.2520

Step 2 — Reproducibility (AV)

AV = √(x̄diff² × K₂² − EV²/nr)
x̄diff = 0.2533, K₂ = 0.7071
AV = 0.1715

EMP Variance Results (Table 6)

Component	Variance	% of Total
Repeatability	0.0407	3.1%
Reproducibility	0.0531	4.1%
R&R (GRR)	0.0938	7.2%
Product (Part-to-Part)	1.216	92.8%
Total	1.310	100.0%

The Intraclass Correlation Coefficient (ρ)

This is EMP's key metric — the ratio of part variance to total variance. It tells you what fraction of observed variation is real product signal vs. gauge noise.

Intraclass Correlation Coefficient ρ

ρ = σ²p / σ²x = 1 − (σ²GRR / σ²total)

ρ close to 1 → most variance is from real part differences (good). ρ close to 0 → measurement system dominates (bad).

Wheeler's Four Monitor Classes — Interpreting ρ

ρ Range	Class	Signal Reduction	Chance Detect ±3σ Shift	Track Process?	%R&R / AIAG
0.8 – 1.0	First Class ★	<10%	>99% (Rule 1)	Up to Cp₈₀	0–20% · Acceptable
0.5 – 0.8	Second Class	10–30%	>88% (Rule 1)	Up to Cp₅₀	20–50% · Marginal
0.2 – 0.5	Third Class	30–55%	>91% (Rules 1–4)	Up to Cp₂₀	50–80% · Unacceptable
0.0 – 0.2	Fourth Class	>55%	Rapidly Vanishing	Unable to Track	80–100% · Unacceptable

Adapted from EMP III: Evaluating the Measurement System, Donald J. Wheeler, SPC Press, 2006.

✅

Our example result: ρ = 0.928 → First Class Monitor. This means less than 10% reduction in process signal, better than 99% chance of detecting a ±3σ shift with Rule 1, and the measurement system can track process improvements all the way to Cp₈₀. The gauge is excellent for SPC use.

All Three Methods Side-by-Side (Same Data)

Source	Average & Range		ANOVA		EMP
Source	Std Dev	%TV (σ)	Variance	%TV (σ²)	Variance	%TV (σ²)
Repeatability	0.202	17.61%	0.0400	3.39%	0.0407	3.1%
Reproducibility	0.230	20.04%	0.0515	4.37%	0.0531	4.1%
R&R	0.306	26.68%	0.0914	7.76%	0.0938	7.2%
Part-to-Part	1.104	96.37%	1.086	92.24%	1.216	92.8%
Total	1.146	—	1.178	100%	1.310	100%

⚠️

Why the Average & Range method is misleading: Standard deviations are not additive (σ_total ≠ σ_parts + σ_ms), so the % column doesn't sum to 100% and is mathematically incorrect for decision-making. The 26.68% R&R figure from the Avg & Range method on this same data looks "marginal" under AIAG criteria, while ANOVA and EMP correctly show 7–8% — clearly acceptable. Bottom line: use ANOVA or EMP.

Which Method to Use?

Avg & Range

Only use if hand calculations are required with no software. Always convert to variance before interpreting. Not recommended.

ANOVA

AIAG-preferred. Detects operator×part interaction. Best for automated environments and PPAP submissions. Use this by default.

EMP

Adds control chart validation and the Monitor Class framework. Use when you want to understand what the gauge can actually do for process control.

Attribute Measurement System Analysis

Attribute gauges produce finite categories (pass/fail, good/bad, or colour grades). Standard GR&R methods don't apply — instead AIAG uses Cohen's Kappa for agreement and Effectiveness for decision accuracy.

Cross-Tabulation and Cohen's Kappa

Kappa measures inter-rater agreement beyond what chance alone would produce.

Cohen's Kappa Formula

κ = (po − pe) / (1 − pe)

p_o = observed agreement | p_e = expected by chance

Interpretation

κ ≥ 0.9 → Excellent
0.7 ≤ κ < 0.9 → Acceptable
κ < 0.7 → Inadequate — investigate

AIAG Attribute MSA Example (Table III-C 3)

Pair	Kappa	Verdict
Appraiser A vs B	0.86	Good agreement
Appraiser B vs C	0.79	Good agreement
Appraiser A vs C	0.78	Good agreement

Appraiser	κ vs Reference	Effectiveness	Miss Rate	False Alarm	Verdict
A	0.88	84%	6.3%	4.9%	Marginal
B	0.92	90%	6.3%	2.0%	Borderline
C	0.77	80%	12.5%	8.8%	Unacceptable

Effectiveness Acceptance Criteria (Table III-C 6)

Decision	Effectiveness	Miss Rate	False Alarm Rate
Acceptable	≥ 90%	≤ 2%	≤ 5%
Marginal	80–89%	≤ 5%	≤ 10%
Unacceptable	< 80%	> 5%	> 10%

📌

Important AIAG caution: A 90% agreement rate on a process with Pp=1.0 doesn't mean 90% of bad parts are caught. Bayes' Theorem must be applied — the probability a rejected part is truly bad depends on the underlying defect rate. At very low defect rates, most "rejected" parts are actually false alarms.

Signal Detection Approach (for %GRR)

When variable reference data is available, the gray zone width between the last universally-accepted and first universally-rejected part estimates 6σ_GRR:

Boundary Analysis — Gauge Discrimination

              dUSL = last-accepted-by-all → first-rejected-by-all (at USL)

              dLSL = same calculation at LSL

              d = average of dUSL and dLSL

              GRRboundary = d / 5.15  (5.15σ = 99% spread)

📊 Attribute MSA Summary

📌

No single appraiser in the AIAG example met ALL three criteria simultaneously. This is the key finding — a system-level decision is needed.

Kappa > 0.75
All pairs met this. Appraisers agree with each other well.
Effectiveness
Only B reached ≥90%. A and C are marginal/unacceptable.
Miss Rate
All three had 6.3%+ miss rate, exceeding the ≤2% threshold. Training needed.

How GRR Distorts Your Cp — AIAG Appendix B

The most important and most overlooked MSA insight: your observed Cp is always lower than your actual process Cp because measurement error inflates the observed variation. Appendix B of AIAG MSA 4th Ed. gives the exact formula.

AIAG Appendix B — Exact Relationships

Process-variation basis

                  %EV = 100 × (EV / TV)
%AV = 100 × (AV / TV)

                  %GRR = 100 × (GRR / TV)
%PV = 100 × (PV / TV)
                

Tolerance basis

                  %EV = 100 × (EV / Tol)
%AV = 100 × (AV / Tol)

                  %GRR = 100 × (GRR / Tol)
TV = √(GRR² + PV²)
                

What This Means in Practice

A high GRR makes your process capability look worse than it really is. This has real consequences: a process may be denied production approval because of its measurement system, not because of the process itself.

🚨

Critical insight: At GRR=70% with Cp_obs=1.30, the actual process Cp is still only 1.04 — barely capable. This means high GRR doesn't just disguise a capable process — it may be masking a barely capable one. Always investigate GRR before concluding a process is incapable.

📊 Appendix B Table — Observed vs Actual Cp

Actual Cp = 1.30, GRR varies (process-based)

GRR %	Cp_obs (process)	Cp_obs (tolerance)
10%	1.29	1.29
20%	1.27	1.26
30%	1.24	1.20
40%	1.19	1.11
50%	1.13	0.99
60%	1.04	0.81
70%	0.93	0.54
90%	0.57	never

At GRR=50%, tolerance-based Cp_obs drops to 0.99 — looks incapable even though actual Cp=1.30!

📊 Observed Cp vs Actual Cp (Actual=1.30) — as GRR increases

GRR 10%

1.29 Nearly unaffected

GRR 30%

1.24 −5% loss

GRR 50%

1.13 −13% loss

GRR 70%

0.93 Below 1.0 ✗

Measurement Tools, Destructive & Non-Destructive Testing

Before you can analyse measurement system variation, you need to select the right measurement tool and understand its capabilities. The Rule of 10 governs tool selection; destructive and NDT methods determine what kind of testing is possible.

Measurement Tools — Precision Hierarchy

Tool	Least count / Resolution	Principle	Typical use
Scale / Tape Measure	1 mm or 0.5 mm	Direct linear measurement against graduated scale	Rough dimensions, layout
Vernier Caliper	0.1 mm or 0.05 mm	Main scale + vernier scale alignment	OD/ID/depth/step measurements
Micrometer	0.01 mm	Screw thread advancement per revolution	Shaft/bore diameters, wall thickness
Gage Blocks (Slip Gauges)	0.001 mm (1 μm)	Precision ground blocks — stacked by "wringing" (molecular adhesion up to 330 N pull force)	Calibration reference, setting instruments
Optical Comparator	Depends on magnification	Magnified silhouette projected on screen — dimensions measured against prescribed limits	Complex profiles, thread forms, gear teeth

💡

Rule of 10 (10:1 Rule): The measuring instrument resolution should divide the tolerance into at least 10 parts. Example: tolerance = ±0.05mm (range = 0.10mm) → minimum instrument resolution = 0.01mm → Digital Vernier (0.01mm) is acceptable; tape measure (1mm) is not. Calibration instruments should be 10× better than the measuring instrument.

Destructive Testing

Destructive tests damage or destroy the test piece. Used when the test must measure failure — cannot be used for 100% inspection. Drives the need for acceptance sampling.

🔩 Tensile Test

Stress-Strain curve analysis. Pulls the specimen to failure.

Stress = Force / Area (Pa = N/m²)
Strain = ΔLength / Length (unitless)
Measures: UTS, yield strength, elongation, Young's modulus
Curve shapes: ductile steel, brittle (concrete/carbon fibre), non-ferrous

💥 Charpy Impact Test (V-notch)

Measures notch toughness — ability to absorb energy during fracture. A pendulum swings and strikes a notched specimen.

Result: energy absorbed (Joules)
Critical for low-temperature applications
Identifies brittle-ductile transition temperature

🔄 Fatigue Test

Applies cyclic loading until failure. Most engineering failures are fatigue-related.

Determines S-N curve (stress vs cycles to failure)
Identifies endurance limit (some steels)
Critical for rotating machinery, aircraft structures

Non-Destructive Testing (NDT)

NDT methods inspect materials and components without causing damage — enabling 100% inspection for critical items. Each method has specific capabilities and limitations.

Method	Principle	Detects	Applicable materials
Radiography (X-ray / Gamma)	Radiation passes through material; defects absorb differently and show on film/detector	Internal voids, porosity, inclusions, weld defects	Most materials — metals, composites, castings
Ultrasonic Testing (UT)	Sound waves >20 kHz transmitted into material; reflections from defects detected	Internal defects, thickness measurement, delaminations	Metals, composites, welds
Magnetic Particle (MT)	Magnetic field applied; field leaks at surface/near-surface defects; magnetic particles accumulate	Surface and near-surface cracks	Ferromagnetic materials ONLY (steel, iron)
Liquid Penetrant (PT)	Dye penetrant drawn into surface cracks by capillary action; developer reveals defects	Surface-breaking defects only	Any material — magnetic AND non-magnetic
Hardness Testing	Indenter pressed into surface; hardness = resistance to indentation (Vickers HV, Brinell HB, Rockwell HR)	Material hardness, heat treatment verification	Most solid materials

Crossed vs Nested GR&R Studies

Crossed GR&R Study

Each operator measures every part. The parts are re-measured multiple times. Enables separation of EV (repeatability) and AV (reproducibility) components.

✓ Standard AIAG GR&R method
✓ Used for non-destructive measurements
✓ Provides separate EV, AV, and interaction estimates
✓ Typical design: 3 operators × 10 parts × 2 replicates

Nested GR&R Study

Each operator measures a different set of parts — typically because the measurement destroys the part. Parts are nested within operators; cannot be measured by more than one operator.

✓ Used for destructive tests (tensile, hardness, chemical)
✓ Cannot separate repeatability from part-to-part variation within operator
⚠️ Reproducibility is confounded with part variation
✓ Requires more parts than crossed design

Quality Philosophy

The foundational reference for quality engineering. Covers the evolution of quality, the philosophies of every major quality pioneer, continuous improvement frameworks, strategic planning, facilitation tools, customer relations, supplier management, and barriers to quality improvement.

Evolution of Quality & the Philosophies That Shaped It

Quality management evolved from pure inspection through statistical control, quality assurance, and total quality management into today's business excellence frameworks. Each pioneer contributed a distinct, testable philosophy that forms the foundation of modern quality engineering.

📊 Evolution of Quality — Key Milestones

W. Edwards Deming — 14 Points & System of Profound Knowledge

Deming taught that 85–94% of quality problems are caused by the system itself — not the workers. His message to Japan in the 1950s transformed their manufacturing. His framework rests on four areas of Profound Knowledge: appreciation for a system, knowledge about variation, theory of knowledge, and psychology.

#	Point	Core idea	Quality engineering implication
1	Create Constancy of Purpose	Long-term commitment to improvement; customer focus; invest in innovation & training	Drives design for reliability, not just today's spec compliance
2	Adopt the New Philosophy	Management must lead change; be prepared for transformation	Quality is not a department — it is a system responsibility
3	Cease Dependence on Mass Inspection	Build quality into the process; inspection is too late & too costly	Prevention > detection; PFMEA before production, not rework after
4	End Lowest-Price Purchasing	Move toward single suppliers on long-term trust; multiple suppliers = more variability	Supplier qualification programs, approved vendor lists
5	Improve Constantly and Forever	PDCA; reduce variation; engage all employees	SPC, DMAIC, continuous capability improvement
6	Institute Training on the Job	People must know how to do their job; training includes tools and improvement methods	Calibration training, GR&R awareness, SPC chart reading
7	Institute Leadership	Supervisors are coaches, not police; understand processes	Process owners empowered to stop the line on defects
8	Drive Out Fear	Mutual respect; workers feel valued and can flag problems freely	Open reporting of defects; psychological safety for quality escalation
9	Break Down Barriers	Cross-functional teams; internal customer concept; common vision	APQP teams, design-manufacturing-quality integration
10	Eliminate Slogans & Posters	Slogans assume people cause problems — the system does	Fix the process, not the person; root cause analysis not blame
11	Eliminate Numerical Quotas	Quotas without a plan are demoralising; substitute leadership	Capability targets backed by process improvement plans
12	Remove Barriers to Pride	Abolish annual merit rating that creates competition; recognise craftsmanship	Team-based quality improvement rewards over individual rankings
13	Institute Education & Self-Improvement	Workers learn new skills to face future challenges	Statistical literacy training; professional development
14	Take Action — Transform	Transformation is everybody's job; cultural change starts at the top	Quality culture deployment through management commitment

💡

Deming's Chain Reaction: Improve quality → costs decrease (less rework, fewer mistakes) → productivity improves → capture the market → stay in business → provide more jobs. The chain begins with quality, not with cost-cutting.

Joseph Juran — The Quality Trilogy & Fitness for Use

Juran defined quality as fitness for use — not conformance to specification. He emphasised top management involvement, project-by-project improvement, and the Pareto principle (vital few vs. useful many). His Quality Control Handbook (1951) remains the definitive reference.

Quality Planning

Preparing to meet quality goals. Identify customers, determine their needs, develop product/process features that respond to those needs, establish quality goals.

Quality Control

Meeting quality goals during operations. Evaluate actual performance, compare to goals, act on the difference. The ongoing process of holding the gains — SPC, inspection, audits.

Quality Improvement

Breaking through to unprecedented levels of performance. Project-by-project — select the project, organise the team, diagnose causes, implement remedies, hold the gains.

#	Juran's 10 Steps to Quality Improvement
1	Build awareness of the need and opportunity for improvement
2	Set goals for improvement
3	Organise to reach the goals (establish a quality council, identify problems, select projects)
4	Provide training
5	Carry out projects to solve problems
6	Report progress
7	Give recognition
8	Communicate results
9	Keep score of improvements achieved
10	Maintain momentum by making annual improvement part of the regular systems and processes

Philip Crosby — Four Absolutes & Quality is Free

Crosby defined quality as conformance to requirements — not goodness or elegance. His 1979 book Quality is Free argued that the cost of poor quality always exceeds the cost of preventing defects. His message to management: the system causes non-conformance, and prevention — not appraisal — is the correct system.

The Four Absolutes of Quality

Definition: Quality is conformance to requirements — not elegance. Do It Right the First Time (DIRFT).
System: The system of quality is prevention, not appraisal. An error that doesn't exist can't be missed.
Standard: The performance standard is zero defects — a management standard, not a motivational slogan.
Measurement: Quality is measured by the Price of Non-Conformance — cost of doing things wrong.

Price of Conformance vs Non-Conformance

Price of Conformance (POC): All expenses necessary to make things right. Quality functions, prevention efforts, quality education, audits.

Price of Non-Conformance (PONC): All expenses involved in doing things wrong — fixing problems, correcting orders, rework, scrap, warranty claims, customer returns.

Crosby's claim: PONC always > POC ∴ Quality is Free

Walter A. Shewhart — Father of Statistical Quality Control

Shewhart invented the control chart in 1924 at Western Electric's Hawthorne Works and introduced the PDCA (Plan-Do-Check-Act) cycle. He was the first to distinguish between common cause (chance) variation and special cause (assignable) variation — the foundational insight behind all SPC.

Key Contributions

📈 Invented the control chart (1924) — X̄-R, p, c, u charts
🔄 Developed the PDSA/PDCA cycle (Shewhart Cycle — later popularised by Deming)
📊 Distinguished common cause (system) from special cause (assignable) variation
📖 Published Economic Control of Quality of Manufactured Product (1931)

Variation Types

Common Cause (Chance): Inherent in the process. Many small, independent sources. Stable and predictable. Only the system (management) can reduce it.

Special Cause (Assignable): An identifiable, specific source outside the system. Intermittent and unpredictable. Operators and engineers can find and fix these.

Pioneer Philosophy Quick-Reference

Pioneer	Quality defined as	Primary framework	Key exam trigger word
Deming	Reduction of variation; customer satisfaction	14 Points, System of Profound Knowledge, PDCA	"Common cause / special cause", "chain reaction"
Juran	Fitness for use	Quality Trilogy (Planning, Control, Improvement), 10 Steps	"Fitness for use", "project-by-project", "vital few"
Crosby	Conformance to requirements	Four Absolutes, Zero Defects, PONC/POC	"Conformance to requirements", "zero defects", "prevention"
Shewhart	Statistical control	Control charts, PDCA cycle, common/special cause	"Control chart", "assignable cause", "PDSA"
Taguchi	Minimum loss to society	Loss function, robust design, parameter/tolerance design	"Loss function", "nominal is best", "signal-to-noise"
Ishikawa	Total quality through all employees	Cause-and-effect diagram, QC circles, 7 tools	"Fishbone", "cause-and-effect", "QC circles"

Continuous Improvement Frameworks

Five major CI frameworks every quality engineer needs to understand — how they relate, where they differ, and when to apply each.

Lean — Eliminate Waste, Maximise Flow

Lean originated with Ford's mass production principles (1910s) and was systematised into the Toyota Production System (TPS) in the 1950s. James Womack, Daniel Roos, and Daniel Jones documented it for the West in The Machine That Changed the World (1990). Lean identifies eight types of waste (DOWNTIME) and organises the entire enterprise around delivering value at the rate demanded by the customer.

The 5 Lean Principles

Value: Specify what creates value from the customer's perspective — not the producer's.
Value Stream: Map all steps in the process chain; eliminate non-value-adding steps.
Flow: Make value-creating steps flow without interruption, batching, or waiting.
Pull: Produce only what is needed by the customer — short-term response to demand rate (takt time).
Perfection: Continuously pursue elimination of all waste; the process never ends.

Lean Benefits

✓ Reduced waste (DOWNTIME: Defects, Overproduction, Waiting, Non-utilised talent, Transport, Inventory, Motion, Extra processing)
✓ Improved quality and customer satisfaction
✓ Reduced inventory and cycle time
✓ Flexible manufacturing capability
✓ Safer workplace and improved employee morale

Six Sigma — Reduce Variation to Near-Zero Defects

Motorola developed Six Sigma in 1987, raising quality standards dramatically. AlliedSignal (now Honeywell), GE, Dow Chemical, DuPont, Whirlpool, and IBM adopted it in the mid-1990s, proving its cross-industry applicability.

🎯

Know CTQs

Identify what's critical to quality from the customer's perspective

📉

Reduce Defects

Drive DPMO down; measure defects per million opportunities

⊙

Centre on Target

Minimise deviation of mean from nominal target value

〰

Reduce Variation

Tighten standard deviation; narrow the process spread

Theory of Constraints (TOC) — Focus on the Weakest Link

Introduced by Eliyahu Goldratt in The Goal (1984). TOC holds that every system has exactly one constraint limiting overall throughput at any given time. Improving a non-constraint does not improve the system — only improving the current constraint does.

TOC Step	Action	Key principle
1. Identify	Find the current constraint — the weakest link in the chain	Physical, Policy, Paradigm, or Marketplace constraints
2. Exploit	Squeeze maximum performance from the constraint using existing resources — no new investment yet	Don't waste constraint capacity on anything non-essential
3. Subordinate	Align all other activities to support the constraint's pace	A non-constraint running faster than the constraint builds WIP, not output
4. Elevate	If the constraint persists after exploiting and subordinating, invest to break it	Add capacity, change the process, redesign
5. Repeat	Once broken, a new constraint will emerge — return to step 1	Continuous improvement is never finished

Total Quality Management (TQM)

TQM is a management approach to achieving customer satisfaction through every person in the organisation working to continuously improve products, processes, and services. Unlike Six Sigma (project-focused) or Lean (waste-focused), TQM is a cultural philosophy. Most quality awards (Baldrige, EFQM, Deming Prize) are grounded in TQM principles.

TQM Core Principles

🎯 Customer focus — internal and external customers
🔄 Continuous improvement (Kaizen) — forever and ever
👥 Total employee involvement — every person owns quality
📊 Process approach — manage activities as interconnected processes
🤝 Supplier partnerships — extend quality into the supply chain

CI Framework Comparison

Framework	Primary focus	Methodology
Lean	Waste elimination, flow	Value stream mapping, 5S, Kaizen
Six Sigma	Variation reduction, defects	DMAIC, statistical analysis
TOC	Throughput, bottleneck	5 focusing steps, drum-buffer-rope
TQM	Culture, customer satisfaction	Quality awards, customer surveys
SPC	Process stability and capability	Control charts, capability studies

Strategic Planning, Deployment & Information Systems

Strategic planning aligns the quality function with organisational goals — covering planning frameworks, deployment tools, and performance measurement including the Balanced Scorecard, leading vs lagging indicators, and project management techniques.

Strategic Planning — VMOSA Framework

Vision

The dream — what the organisation aspires to become in the long term

Mission

What the organisation does and why it exists — the purpose statement

Objectives

How much of what — specific, measurable goals to achieve the mission

Strategies

How — broad approaches used to achieve each objective

Action Plans

Who will do what by when — the specific tasks assigned to specific people

Balanced Scorecard — Kaplan & Norton

Developed by Robert Kaplan and David Norton, the Balanced Scorecard translates strategy into four perspectives of performance measurement — preventing over-reliance on financial metrics alone. Quality professionals use it to frame the value of quality investments in language executives understand.

💰 Financial Perspective

How do we look to shareholders? Revenue growth, profitability, cost reduction, ROI. Quality metric: Cost of Poor Quality (COPQ) as % of sales revenue.

🎯 Customer Perspective

How do customers see us? Satisfaction scores, NPS, on-time delivery, defect rates in the field, warranty claims per unit.

⚙️ Internal Processes Perspective

What must we excel at internally? Process yield, Cpk levels, first-pass yield, defect rate, audit outcomes, cycle time.

📚 Learning & Growth Perspective

Can we continue to improve and create value? Training hours, certifications (ASQ, IASSC), employee engagement, suggestion rate, new quality tools adopted.

Leading vs Lagging Indicators

Type	Definition	Characteristics	Quality examples
Lagging Indicators	Post-event (output) measures — what has already happened	Easy to measure, historically accurate, but cannot prevent what already occurred	DPMO, defect rate, warranty returns, customer complaints, scrap cost, Cpk
Leading Indicators	Predictive (input) measures — early signals of future performance	Difficult to identify and validate; harder to measure; not guaranteed predictors	Training hours, PFMEA completion %, process audit scores, SPC chart compliance, supplier qualification status

💡

Best practice: Use a mix of both. Lagging indicators tell you what happened; leading indicators tell you where you're heading. A dashboard with only lagging metrics is a rearview mirror — add leading metrics to steer the process proactively.

Stakeholder Identification & Analysis

ISO 9001:2015 clause 4.2 requires organisations to determine interested parties and their requirements. Stakeholder analysis maps each party by their level of interest and power/influence, then defines the appropriate engagement strategy.

Stakeholder Power-Interest Grid

KEY PLAYERS

High power, High interest

→ Manage closely

LATENTS

High power, Low interest

→ Keep satisfied

DEFENDERS

Low power, High interest

→ Keep informed

APATHETICS

Low power, Low interest

→ Monitor

Stakeholder Examples

👔 Internal: Owners, managers, employees, partners
🏭 Supply chain: Suppliers, sub-tier suppliers
🛒 Market: Customers, end users
🏛️ External: Regulators, industry associations, media, local community

ISO 9001:2015 §4.2 — Monitor and review stakeholder requirements

Quality Information System (QIS)

A QIS is the data-centric infrastructure of the quality management function — the systems used to collect, store, analyse, and report quality-related data across the organisation.

Data Captured by a QIS

📋 Design reviews and change records
🔍 Audit findings and corrective actions
⚠️ Non-conformances and dispositions
🔧 Repairs, returns, warranty claims
😊 Customer satisfaction surveys
📊 Test reports, certificates, performance data

QIS Benefits

✓ Identifies priorities for improvement investment
✓ Tracks performance of quality initiatives and ROI
✓ Enables competitor performance benchmarking
✓ Breaks silos — all departments access the same quality data
✓ Supports fact-based decision making at every level

Team Dynamics, Leadership & Facilitation Tools

Effective quality improvement requires high-performing teams — covering team types, the Tuckman model of team development, team roles, and the facilitation tools used in quality projects.

Team Types

Team Type	Description	Quality context
Functional	Members from same department/function with similar expertise	Quality lab team, inspection team, calibration group
Cross-Functional	Members from multiple departments working on a shared goal	APQP team, PFMEA team, 8D corrective action team
Virtual	Geographically dispersed team relying on technology to collaborate	Global supplier quality teams, multi-site audit teams
Self-Managed	Team with authority to set own goals, methods, and schedules	Autonomous production cells with built-in quality responsibility
Quality Circles	Voluntary groups of front-line workers meeting regularly to identify and solve quality problems — introduced by Ishikawa	Shop-floor improvement groups, Kaizen circles

Tuckman Model of Team Development

Bruce Tuckman's five-stage model (1965, extended 1977) describes the predictable journey teams undergo from formation to high performance. Understanding which stage a team is in allows a leader or facilitator to apply the right intervention.

👋

FORMING

Members first come together; polite, uncertain about roles and goals; depend on leader for direction

⚡

STORMING

Conflict emerges; teamwork harder than expected; power struggles; important not to suppress but navigate

🤝

NORMING

Team moves beyond storming; norms established; collaboration improves; roles clarified

🚀

PERFORMING

High performance; team is self-directing; interdependent; focused on goals

👋

ADJOURNING

Task complete; team disbands; celebrate achievements, capture lessons learned

Team Roles — Leader, Facilitator, Coach, Members

Role	Primary responsibilities	Key distinction
Leader	Provides direction; clarifies roles; establishes ground rules; ensures goal completion; conducts meetings; assigns tasks	Has formal authority and accountability for the team's output
Facilitator	Helps the team understand its objective and how to achieve it; guides process without dictating content	No formal authority to make decisions — leads by process, not position
Coach	One-to-one support after training; first point of contact for issues; uses GROW model	Develops individuals; not the same as a trainer (one-to-many)
Members	Participate actively in meetings; perform assigned tasks; contribute ideas in brainstorming	Own the work; team's subject matter experts

💡

GROW Coaching Model: Goal — what does the team/individual want to achieve? Reality — what is the current state and what challenges exist? Obstacles — what is stopping progress? Way forward — what specific steps will be taken and by when?

Facilitation Tools

🧠 Brainstorming

Group or individual technique to generate ideas spontaneously for a specific problem. Quantity over quality — defer all judgment during generation.

Four Rules:

1. Focus on quantity — more ideas = more options
2. Withhold criticism — no evaluation during generation
3. Welcome unusual ideas — wild ideas often spark practical ones
4. Combine and improve — build on others' ideas (1+1=3)

📋 Nominal Group Technique (NGT)

Structured process for problem identification, solution generation, and group decision-making. Prevents dominant voices from controlling the output.

Five Steps:

Introduction and explanation of the problem
Silent individual generation of ideas (written)
Round-robin sharing — one idea per person per turn
Group discussion and clarification
Voting and ranking to reach group decision

🗳️ Multi-Voting

Used after brainstorming generates a long list — reduces/narrows the list using group consensus without endless debate.

Each member selects their top N ideas and ranks them (e.g. top 5, scored 5 down to 1). Scores are summed — highest total = group priority. Repeat until a manageable shortlist remains.

⚖️ Force Field Analysis

Identifies and maps the forces driving change against the forces resisting it. Developed by Kurt Lewin. Used in change management and improvement planning.

Driving forces (strengthen these): customer demand for fewer defects, competitive pressure, lower downtime, increased sales opportunity. Restraining forces (weaken these): initial investment cost, fear of new technology, habit/inertia.

Conflict Resolution — Thomas-Kilmann Model

Style	Concern for Self	Concern for Others	When to use
Competing	High	Low	Safety emergencies; critical quality hold decisions; when you know you're right
Collaborating	High	High	Complex quality problems requiring buy-in from all parties; best long-term solution matters
Compromising	Medium	Medium	When a temporary solution is needed; when both parties have equally valid goals
Avoiding	Low	Low	When the issue is trivial; when more information is needed before engaging
Accommodating	Low	High	When preserving the relationship matters more than the outcome; when you're wrong

Customer Relations & Supplier Management

Quality professionals must manage both directions of the value chain — understanding and capturing customer requirements, and ensuring suppliers deliver conforming product and services reliably.

Supplier Lifecycle Management

With mid-to-large corporations spending ~50% of revenue on purchased goods and services, supplier management is critical to organisational success. The Supplier Lifecycle Management framework is a structured, end-to-end approach to managing suppliers transparently, mitigating risk, reducing costs, and building long-term partnerships.

① Selection & Qualification

Identify → Shortlist → Prequalify → Bidders list → RFP/RFQ → Evaluate → Award. Includes sub-tier supplier identification.

② Performance Monitoring

Set performance expectations; process reviews; evaluations against KPIs (cost, quality, schedule, responsiveness); improvement plans; exit strategies.

③ Classification

Tier suppliers: Non-approved → Approved → Preferred → Certified → Partnership → Disqualified. Classification drives audit frequency and oversight level.

④ Partnerships & Alliances

Develop strategic customer-supplier partnerships; shared improvement initiatives; joint development; supply chain resilience strategies.

Supplier Selection Process

Step	Activity	Key considerations
1. Identify	Find potential suppliers; new suppliers may offer cost or quality advantage; promote local suppliers	Market research, industry directories, referrals
2. Shortlist	Screen to avoid late delivery, poor quality, non-responsive suppliers	Market reputation, public information, financial health
3. Prequalify	Assess financial stability, capacity, quality certifications (ISO 9001), client approvals	On-site surveys, questionnaires, certificate verification
4. Bidders List	Maintain a qualified list to avoid repeating prequalification each time	Approved Vendor List (AVL) maintenance
5. Request Bids	RFP — buyer states preferences, bidder explains how they'll meet them. RFQ — buyer provides exact spec, bidder quotes a price	Choose RFP when requirements are not fully defined
6. Evaluate Bids	Score against pre-determined criteria: price, quality, schedule, commercial terms, financial stability, production capability, HSE responsibility	Weighted scoring matrix; multi-person evaluation team
7. Award	Place Purchase Order with selected supplier	Contractual quality requirements, inspection criteria, escalation process

Supplier Performance Monitoring Parameters

💰 Cost

Under/over budget variance
Cost savings achieved
Cost-reduction proposals

✅ Quality

Incoming defect rate (PPM)
Returns and failures
Corrective action closure rate

📅 Schedule

On-time delivery %
Shortage incidents
Lead time vs committed

📞 Responsiveness

Response time to queries
Flexibility to order changes
Escalation engagement

Risk Management, Business Continuity & Barriers to Quality

Risk — ISO 31000 Definition & Framework

Risk = Effect of uncertainties on objectives (ISO 31000:2009). An effect is a deviation from expected — positive (opportunity) or negative (threat). A risk that has already occurred is reclassified as an issue. Risk is characterised by its potential consequences and the likelihood of occurrence.

Risk Management Step	Activity	Quality tool
1. Identify Risks	List all potential threats and opportunities that could affect objectives	FMEA, HAZOP, brainstorming, risk register
2. Prioritise Risks	Score by probability × impact; focus resources on high-priority risks	Risk matrix (5×5), RPN in FMEA
3. Mitigation Control	Define actions to reduce probability and/or impact of each risk	Control plans, poka-yoke, redundancy
4. Mitigation Effectiveness	Monitor whether controls are working; update risk register	KPIs, audits, leading indicators tracking

Business Continuity Plan (BCP)

A system of prevention and recovery for potential threats to the organisation. Covers extreme, existential scenarios.

Common threats: Fire, flood, earthquake, strike, war, power outage, cyber attack, terrorist attack.

Contingency Planning

A plan for outcomes other than the expected — less extreme than BCP. Covers probable disruptions.

Examples: Supplier bankruptcy, price/currency fluctuation, component discontinuation, key personnel departure.

Resiliency

The capacity to rapidly adapt and recover from internal or external disruptions. IBM identifies six building blocks of resilience:

Recovery · Hardening · Redundancy · Accessibility · Diversification · Autonomic Computing

Supply Chain Risk Categories

Where	Risk category	Examples
At Supplier	Natural causes	Flood, earthquake, wildfire destroying plant or inventory
At Supplier	Man-made causes	Strike, fire, civil unrest, quality failure, management change
At Supplier	Economic causes	Insolvency, sub-supplier failure, currency collapse, credit freeze
In Transit	Natural or man-made	Port closure, transport strike, customs hold, damage in transit
On Receipt	Quality or reputational	Defective product, counterfeit parts, labelling errors, regulatory non-compliance

Barriers to Quality Improvement

Understanding why quality improvement initiatives fail is as important as knowing how to run them. The engineering practice tests recognition of these barriers and appropriate countermeasures.

Common Barriers

🔀 Confusion over the definition of quality — when quality means different things to different stakeholders, initiatives fragment
👤 Lack of leadership — quality improvement without visible management commitment fails at the first obstacle
⏳ Short-term thinking — quality ROI is often long-term; pressure for immediate financial results kills improvement programs
📊 Lack of data — unable to quantify the magnitude of the problem or the benefit of fixing it
🎓 Insufficient qualified people — quality improvement requires statistical literacy and tool expertise (Black Belt, quality engineering, etc.)

Countermeasures

✓ Align on a single, clear quality policy — signed by top management and communicated to all
✓ Visibly involve senior leaders in quality reviews, audits, and improvement projects
✓ Link quality metrics to the Balanced Scorecard to give them financial language
✓ Build a QIS to capture and surface data that quantifies the cost of poor quality
✓ Invest in CQT/Black Belt certifications; develop internal quality competency

ASQ Code of Professional Ethics — Three Pillars

① Integrity & Honesty

Be truthful in all professional interactions. Accurately represent qualifications, certifications, and affiliations. Offer services only within areas of genuine competence. Make decisions in an objective, factual manner.

② Responsibility, Respect & Fairness

Hold paramount the safety, health, and welfare of individuals and the public. Treat others fairly, courteously, with dignity, and without discrimination. Act in a socially responsible manner.

③ Proprietary Information & Conflicts

Protect confidential information; never use it for personal gain. Disclose and avoid real or perceived conflicts of interest. Give credit where due; do not plagiarise. Obtain and document permission to use others' intellectual property.

Classification of Quality Characteristics

Understanding what quality means to different stakeholders — from product performance to service interactions — is foundational to quality engineers Body of Knowledge. Three frameworks define quality characteristics at different levels of abstraction.

Garvin's 8 Dimensions of Product Quality

David Garvin (Harvard, 1987) proposed that quality is multi-dimensional — a product can be high quality on one dimension and poor on another. This prevents organisations from optimising a single metric at the expense of overall customer value.

#	Dimension	Definition	Quality engineering relevance
1	Performance	Primary operating characteristics — does the product do what it should?	CTQ characteristics, functional specifications, Cpk targets
2	Features	Secondary supplementary attributes that enhance the basic function	Voice of Customer (QFD), feature vs cost trade-offs
3	Reliability	Probability that the product performs its intended function over time without failure	MTBF, Weibull analysis, bathtub curve, reliability testing
4	Conformance	Degree to which a product meets pre-established standards and specifications	Cpk, DPMO, attribute inspection, MIL-STD-1916
5	Durability	Useful life of the product before replacement is preferable to repair	Accelerated life testing, design for reliability
6	Serviceability	Speed, courtesy, competence, and ease of repair	MTTR, design for maintainability, spare parts availability
7	Aesthetics / Style	How the product looks, feels, sounds, tastes, or smells — subjective	Visual inspection standards, appearance audits, colour matching
8	Perceived Quality	Reputation and image — what the customer believes based on brand and word of mouth	Customer satisfaction surveys, NPS, warranty claim rates

💡

Key relationships: Reliability = MTBF/failure rate. Conformance = meets spec/Cpk. Serviceability = MTTR/maintainability. Perceived quality = customer perception/surveys.

SERVQUAL — Service Quality Dimensions

Parasuraman, Zeithaml, and Berry (1985) identified 10 service quality dimensions that customers use to evaluate service. These were later consolidated into 5 dimensions — the RATER model.

Original 10 SERVQUAL Dimensions

#	Dimension
1	Reliability
2	Responsiveness
3	Competence
4	Access
5	Courtesy
6	Communication
7	Credibility
8	Security
9	Understanding the customer
10	Tangibles

Consolidated to 5 — The RATER Model

R — Reliability

The ability to perform the promised service dependably and accurately

A — Assurance

Knowledge and courtesy of employees; their ability to convey trust and confidence

T — Tangibles

Appearance of physical facilities, equipment, personnel, and communication materials

E — Empathy

Provision of caring, individualised attention to customers

R — Responsiveness

Willingness to help customers and provide prompt service

Lean Deep-Dive — Waste, Metrics, SMED & Visual Controls

Lean is built on one fundamental idea: waste exists in all processes at all levels. Eliminating waste is the key to successful lean implementation and the most effective way to increase profitability without capital investment.

Muda, Mura & Muri — The Three Types of Waste

Muda — 無駄

Activity that is wasteful / non-value-adding

Type I Muda (Incidental): Non-value-added tasks that seem necessary — business conditions must change to eliminate them (e.g. regulatory inspections).

Type II Muda (Pure Waste): Non-value-added tasks that can be eliminated immediately — no business justification.

Mura — 斑

Unevenness / variation leading to imbalance

Mura exists when workflow is out of balance or workload is inconsistent. Creates alternating overloading and underloading.

SMED reduces Mura by enabling smaller batch sizes and more frequent changeovers — smoothing out production flow.

Muri — 無理

Overburden — unreasonable stress on people/equipment

For people: too heavy a mental or physical burden — leads to quality errors, injuries, and absenteeism.

For machines: running beyond designed capacity — leads to breakdowns and quality deterioration.

8 Types of Muda — DOWNTIME

The original Toyota Production System identified 7 types of muda. Western lean practitioners added an 8th — under-utilised staff (knowledge, talent, and creativity). The acronym DOWNTIME (or TIMWOOD) covers all eight:

Letter	Waste	Definition	Example
D	Defects	Sorting, rework, repetition, or making scrap	Welding defects requiring re-weld; wrong labels requiring replacement
O	Overproduction	Producing too much, too early, and/or too fast	Printing 1,000 brochures when only 200 are needed
W	Waiting	People or parts waiting for a work cycle to finish	Operator idle while machine cycles; material waiting in queue
N	Non-utilised talent	Failure to exploit employees' knowledge, skills, and creativity	Asking assembly workers to follow instructions without seeking their improvement ideas
T	Transportation	Unnecessary movement of people or parts between processes	Moving parts from one building to another before assembly
I	Inventory	Materials parked and not having value added to them	Raw material sitting in a warehouse for 3 weeks
M	Motion	Unnecessary movement of people or parts within a process	Operator walking 15m to get tools that could be stored at the workstation
E	Extra Processing	Processing beyond what the customer requires or demands	Polishing a surface that will be hidden; generating reports nobody reads

Standard Work

Standard Work means doing work in a standard way — one best-known method, followed consistently by all people for that task. It is the foundation of quality, safety, and continuous improvement.

✓ All people perform one task in one way only
✓ Eliminates variation caused by different methods
✓ Makes abnormalities immediately visible
✓ Improvements lead to revised standard work — the PDCA cycle applied to work methods
✓ Not "the boss's way" — the best-known way, documented and agreed

Standard Work Documents

Standard Work Chart: Shows sequence of tasks, times, and movement in a cell layout
Job Instruction Sheet: Step-by-step WI with quality checkpoints and safety notes
Time Observation Sheet: Records actual vs takt time — identifies bottlenecks

Process Flow Metrics — Takt, Cycle, Lead Time & Throughput

Metric	Definition	Formula	Worked example
WIP Work In Progress	Partially finished goods in the process waiting for completion	—	50 units partially assembled on the production floor
WIQ Work In Queue	Material at a workstation waiting to be processed (subset of WIP)	—	12 units waiting in the queue at Process 3 (the bottleneck)
Touch Time	Time material is actually being worked on — excludes moving and waiting	—	30-minute cycle time; 8 min actual machining → touch time = 8 min
Takt Time	Time available to produce one unit to meet customer demand	Takt = Net time / Demand	40 hrs/week, 10 units/week → Takt = 4 hrs/unit. With 1 hr breaks: Net = 35 hrs → Takt = 3.5 hrs/unit
Cycle Time	How long it takes to complete a specific task from start to finish — for one process step	CT = 1 / Throughput	If takt = 3.5 hrs, and Process 3 takes 3.5 hrs → Process 3 is balanced. If it takes 4 hrs → bottleneck.
Lead Time	Total time from work requested to work delivered — includes all waiting and processing time	LT = WIP / Throughput	WIP = 50 units, Throughput = 10 units/day → Lead Time = 50/10 = 5 days
Throughput Rate	Average number of units processed per time unit	TR = 1 / Cycle Time	Cycle time = 20 min → TR = 3 units/hr → 24 units/8hr shift

SMED — Single-Minute Exchange of Die

What & Who

SMED is a lean methodology for rapidly converting a manufacturing process from running one product to running the next. Developed by Shigeo Shingo. "Single-Minute" means less than 10 minutes (single digit) — not literally 1 minute.

Benefits

✓ Reduced inventory (smaller economic batch sizes)
✓ Increased machine utilisation despite more changeovers
✓ Elimination of setup errors
✓ Reduced defect rates (less scrap at startup)
✓ Reduces Mura — balances production line

8 Techniques for Implementing SMED

Separate internal from external setup operations (internal = machine must stop; external = can be done while machine runs)
Convert internal to external setup
Standardise function, not shape
Use functional clamps or eliminate fasteners altogether
Use intermediate jigs
Adopt parallel operations
Eliminate adjustments
Mechanisation

Visual Controls — Andon & Jidoka

Visual Controls — 4 Types

Type	Question answered	Examples
Identification	What is it?	Labels, colour-coded bins, part numbers
Informational	What is the current status?	Andon lights, production boards, KPI dashboards
Instructional	How should the task be performed?	WI posted at workstation, standard work charts
Planning	What is the plan?	Kanban boards, production schedules, Gantt charts

Andon — Status Indicator Light

A visual control device that indicates the status of a machine, line, or process at a glance:

Green — Normal operations

Yellow — Changeover or planned maintenance due

Red — Problem occurred, machine/line is stopped

Jidoka — Automation with a Human Touch

The ability to stop work (machine or line) when a problem is detected. Prevents defects from being passed downstream and ensures immediate corrective action. The Andon system is the device that activates Jidoka by signalling the problem.

OEE — Overall Equipment Effectiveness

OEE measures how effectively a manufacturing operation is utilised, combining availability, performance, and quality into a single metric. World-class OEE is generally considered to be ≥85%.

OEE = Availability × Performance × Quality

Component	Formula	Measures
Availability	Run Time / Planned Production Time	Unplanned downtime losses
Performance	Actual Output / Max Possible Output	Speed losses and minor stoppages
Quality	Good Parts / Total Parts Produced	Defects and rework losses

OEE Worked Example

            Planned time:    8 hrs = 480 min

            Downtime:        60 min

            Run time:        420 min

            Availability:    420/480 = 87.5%

            Ideal cycle time: 1 min/part

            Actual output:   400 parts (420 possible)

            Performance:     400/420 = 95.2%

            Good parts:      380 of 400

            Quality:         380/400 = 95.0%

            OEE = 87.5% × 95.2% × 95.0% = 79.1%

💡

World-class benchmark: Availability ≥90%, Performance ≥95%, Quality ≥99.9% → OEE ≥85%

Root Cause Analysis — Finding the Real Problem

Most organisations fix the same problems over and over. Root cause analysis (RCA) breaks that cycle by asking why until the true source of a problem is found — then eliminating it permanently. Based on ASQ sources including Andersen & Fagerhaug and Duke Okes.

The Core Idea

🩹

Symptom Fix

"The machine keeps jamming."

→ Clear the jam. Back to work.

Problem returns next week.

🔍

Physical Cause Fix

"A worn guide rail is causing the jam."

→ Replace the rail.

Problem stays away — until the next part wears.

⚙️

System / Root Cause Fix

"No PM schedule exists for guide rails."

→ Create a preventive maintenance process.

The class of problem is eliminated.

Only a system-level cause — a change to the way the organisation operates — truly prevents recurrence. Physical cause fixes are necessary but not sufficient.

The Cause Hierarchy — Drilling Down

Every visible problem sits at the top of an iceberg. Below it are layers of cause. Most organisations only fix the visible tip.

The Cause Iceberg

👁

Symptom (Visible)

The jam. The defect. The complaint.

⬇️

First-level Cause

The worn rail. The missing label.

⬇️

Higher-level Cause

No inspection process. Poor training.

🎯

Root Cause (System)

No maintenance policy exists.

Physical Cause

The tangible, material thing that failed or caused the event. Also called direct, immediate, or proximate cause. Fixing it is necessary — but only solves this occurrence.

Human Cause

Human error, forgetfulness, or lack of skill. Critical: don't stop here. Ask what system failed to support the human. Blame eliminates people, not problems.

System / Latent Cause ← Find This

A policy, procedure, training gap, or organisational decision that created the conditions for the failure. This is the root cause. Fixing it changes how the organisation operates — preventing the whole class of problem.

The 6-Step RCA Process — The Story Arc

Think of RCA as a detective story. You start with a crime scene (the event), gather evidence (causes), interrogate witnesses (data), find the culprit (root cause), and change the system so it can never happen again.

Step 1

🔎

Define the Event

Write a precise, unambiguous description of the problem. Answer: What? When? Where? Who? How often? What consequences?

Hairy: "The process is slow."
Precise: "Window replacement takes 47 min avg vs 20 min standard, occurring 3× weekly since Jan, costing $8,400/mo in overtime."

Step 2

🗺️

Find Causes

Map the process with a flowchart. Brainstorm all possible causes. Use a fishbone (Ishikawa) diagram to organise them into categories.

Key categories: Equipment · Environment · Methods · Materials · Measurement · People

Step 3

🎯

Find the Root Cause

Use the 5 Whys to drill down. Build a cause-and-event tree. Use Pareto to prioritise. Don't declare success too early.

Rule: Keep asking "why" until you reach something the organisation can change — a policy, process, or system.

Step 4

💡

Find Solutions

Generate solutions using "Why Not" principles. Use an Impact/Effort matrix to select the best option. Involve those who will implement.

Analogy thinking: how has another industry solved a similar problem? Don't be constrained by how things are currently done.

Step 5

🚀

Take Action

Use a Force Field Analysis to anticipate resistance. Run a pilot. Assign clear ownership. Be patient — lasting change takes time.

Involve those who must change their work. A solution designed against people is a solution that will fail.

Step 6

📊

Measure & Assess

Track the metrics that defined the problem in Step 1. Confirm the solution works. Assess effectiveness over time.

If the problem returns — the root cause was not truly found. Return to Step 3, not Step 5.

The 5 Whys — A Worked Example

Developed at Toyota as part of the TPS. The idea: keep asking "why" until you reach the system-level cause. Five iterations is a guideline — stop when you reach something that can be permanently changed.

Scenario

A lamp manufacturer is scrapping 12% of finished assemblies due to dimensional variation in lamp holders from a supplier.

Why #

Question

Answer

Why 1

Why are lamp holders out of spec?

Supplier dimensions vary beyond tolerance.

Why 2

Why does supplier variation exceed tolerance?

No dimensional specification was communicated to the supplier.

Why 3

Why was no specification communicated?

Procurement selected supplier on price only. Engineering was not involved.

Why 4

Why wasn't engineering involved in supplier selection?

No cross-functional supplier approval process exists.

Why 5 ✓

Why is there no cross-functional approval process?

Procurement policy only requires lowest price. Quality and engineering sign-off is not mandated. ← Root Cause

Cost reality check: Procurement saved ~$50,000/yr on purchase price. The rework and scrap cost from the same decision? Over $200,000/yr. The root cause was a procurement policy that optimised the wrong metric.

RCA Toolbox — The Right Tool for Each Step

Step 1–2 · Mapping

Fishbone (Ishikawa) Diagram

Organises possible causes into 6M categories: Machine, Method, Material, Man, Measurement, Mother Nature. The "spine" points to the problem; "bones" are cause categories.

Best for: brainstorming sessions with cross-functional teams where all possible causes are unknown.

Step 3 · Drilling Down

5 Whys

Ask "why" repeatedly until a system-level cause is reached. Simple, fast, and effective for straightforward problems. For complex issues, use a Cause-and-Event Tree.

Warning: it is possible to arrive at the wrong root cause if evidence is not collected carefully.

Step 2–3 · Data Analysis

Pareto Chart

Ranks causes by frequency or cost. Reveals the vital few from the trivial many. The 80/20 principle — 20% of causes typically create 80% of problems.

Tip: look at the data multiple ways — by frequency AND by cost. The Pareto priority may differ.

Step 3 · Root Cause Confirmation

Cause-and-Event Tree

A hierarchical diagram showing connections between causes at different levels. Used to prune possible causes, reveal compound causes, and trace pathways from event back to root.

Use when: multiple independent causes exist or when cause chains are complex and branching.

Step 4 · Solution Selection

Impact / Effort Matrix

Plot each potential solution on a 2×2 grid: impact (high/low) vs effort (high/low). Quick wins sit in high-impact, low-effort. Avoid low-impact, high-effort.

Involve the people who must implement — their effort estimate is the one that matters.

Step 5 · Implementation

Force Field Analysis

Lists forces driving the change against forces restraining it. Helps teams plan how to amplify driving forces and reduce resistance before implementation begins.

Key insight: reducing restraining forces is usually more effective than amplifying driving forces.

8 Mistakes That Kill an RCA

❌

Stopping at the symptom

"We fixed the jam" — without asking why the jam happened or why it wasn't prevented.

❌

Declaring success too early

Stopping at the physical cause — "we replaced the part." The system that allowed it to fail is unchanged.

❌

Blaming people

"Operator error" is never a root cause. It is always a prompt to ask: what system failed to prevent or catch the human error?

❌

Vague problem definition

"The process is slow." Without specifics — what, where, how often, at what cost — the team will solve different problems.

❌

Speculation before data

Teams jump to "I think it's X" before mapping the process or collecting evidence. Confirmation bias sets in.

❌

Ignoring compound causes

Many problems have multiple independent causes — fixing one doesn't eliminate the other. Each branch needs its own "why" chain.

❌

Skipping the pilot

Implementing a solution at full scale without testing it first. If it doesn't work, the cost and disruption are multiplied.

❌

Not measuring the result

Without returning to the Step 1 metrics after implementation, you never know if the root cause was truly found and fixed.

Sources: Andersen & Fagerhaug, ASQ Pocket Guide to Root Cause Analysis (ASQ Quality Press, 2014) · Duke Okes, Root Cause Analysis: The Core of Problem Solving and Corrective Action, 2nd ed. (ASQ Quality Press, 2019)

Quality Systems

QMS certification, PPAP/APQP, special characteristics, 8D problem-solving with hard deadlines, and supplier performance management — the complete automotive supply chain quality framework.

QMS Operating Standard

Quality, Cost & Delivery — Zero Defect is Not Aspirational

Every supplier QMS must deliver green-rated performance across QCD. These are operational standards with zero tolerance on Safety & Regulation requirements.

Zero-Defect Core Objectives

0 PPM Strategy

Zero defective parts shipped to the customer. No acceptable defect rate — the target is absolute prevention, not statistical tolerance.

0 Tolerance — S/R Requirements

Safety and Regulation characteristics carry absolute zero tolerance. No sampling plan, no concession, no deviation permitted.

0 IPB Strategy

Zero Incidents per Billion — the field performance target for safety-critical systems. Drives design robustness requirements upstream.

Green Supplier Scorecard

Supplier Self Assessment (SSA) fully compliant. Maintained green status on the OEM Supplier Scorecard across quality, delivery, and responsiveness metrics.

QMS Certification Progression

📊 QMS Maturity Ladder — ISO 9001 to IATF 16949 (3rd Party)

Foundation

Level 1

ISO 9001

3rd Party Cert.

Customer Aligned

Level 2

ISO 9001 + CSR

MAQMSR aligned

Automotive Grade

Level 3

IATF 16949

2nd Party Audit

Gold Standard

Level 4 ★

IATF 16949

OEM Target

↑ Add CSRs

↑ IATF scope

↑ 3rd party cert.

AIAG Core Tools — All Five Required in Every Supplier QMS

APQP

Advanced Product Quality Planning

PPAP

Production Part Approval Process

FMEA

Failure Mode & Effects Analysis

MSA

Measurement System Analysis

SPC

Statistical Process Control

📋 Record Retention

Maintain quality records — retrievable and legible — for the life of the program. Applies to sub-suppliers.

Non-conforming product records retained for trend analysis per AIAG / ISO 9001 / IATF 16949.

🌿 Environmental Requirements

MINIMUM All applicable local government regulations met.

TARGET ISO 14001 Environmental Management System or equivalent.

NEW OEM Achieve ISO 14001 certification within 3 years of first order.

Production Part Approval Process

PPAP — Proving Production is Ready Before It Starts

PPAP is the supplier's formal proof that the production process can consistently make conforming parts at the quoted rate. It is not a one-time paperwork exercise — it is evidence of process understanding. Level 3 is the default: PSW + complete 18-element data package.

The 18 PPAP Elements — What Every Package Must Contain

The AIAG PPAP manual (4th edition) defines 18 elements. Which elements are required for submission depends on the Level (1–5) — but the supplier must generate all elements internally regardless of what is submitted to the customer.

1 Design Records

All drawings (CAD/2D), specifications, and engineering change documents. If supplier owns design: DFMEA required. Customer-owned design: drawings provided by customer.

2 Authorised Engineering Change Documents

All open engineering changes not yet incorporated into the design record. Must show written customer authorisation. Includes ECNs, deviation permits, and waivers.

3 Customer Engineering Approval

Written approval from the customer engineering activity — typically a signed prototype or pre-production buy-off. Required before production tooling is committed.

4 Design FMEA (DFMEA)

Required when supplier owns the design. Documents all potential failure modes of the design and their effects. Severity, Occurrence, and Detection ratings. Must be live — not a snapshot.

5 Process Flow Diagram

Step-by-step flow of the entire production process — from incoming material through shipping. Must match the Control Plan and PFMEA. Includes all operations, inspections, and rework loops.

6 Process FMEA (PFMEA)

Risk analysis of the manufacturing process — not the design. Documents how each process step can fail, its effect on the product, and controls in place. Drives the Control Plan. RPN threshold typically ≤100.

7 Control Plan

Three phases required: Prototype, Pre-Launch, and Production. Documents every control method for each characteristic — measurement method, frequency, sample size, reaction plan. The living document of process control.

8 Measurement System Analysis (MSA)

GR&R studies for all gauges measuring CCs and SCs. Typically 3 operators × 10 parts × 2 trials. %GRR <10% preferred; <30% conditionally acceptable; >30% — gauge must be improved before PPAP.

9 Dimensional Results

Full balloon-drawing inspection of a minimum 6 parts (or per customer requirement). Every characteristic on the print — not just CCs. Results shown in table format with nominal, tolerance, and actual measured values.

10 Material & Performance Test Results

Test results for all material specifications (tensile, hardness, chemical composition) and functional performance tests (fatigue, pressure, thermal cycling). Must include lab certification and traceability to production material.

11 Initial Process Studies (Cpk)

SPC data from the PPAP production run for all CCs and SCs. Minimum 25 subgroups / 100 data points. Cpk ≥ 1.67 required for initial study. If not achieved: 100% inspection mandatory until Cpk improves.

12 Qualified Laboratory Documentation

Scope of accreditation for all labs performing tests (internal or external). ISO/IEC 17025 accreditation preferred. Must show the tests performed are within the lab's accredited scope.

13 Appearance Approval Report (AAR)

Required only for parts with appearance specifications (colour, texture, gloss, surface finish). Customer sign-off on physical colour/texture masters. AAR is a separate customer approval — not a dimensional check.

14 Sample Production Parts

Typically 6 production parts from the PPAP run (or per customer CSR). Must be from production tooling, at production rate, using production materials. Not prototype or pre-production parts.

15 Master Sample

One part signed off by both supplier and customer. Retained at the supplier (or customer if required) as the reference standard for appearance, dimensions, and functional acceptance criteria throughout the programme.

16 Checking Aids

All part-specific gauges, fixtures, jigs, and templates used for inspection. Must be documented and calibrated. Checking aid drawings and calibration records submitted where required by the customer.

17 Customer-Specific Requirements

Any additional requirements from the OEM Customer Specific Requirements (CSRs). Each OEM publishes their own CSR supplement — e.g. GM BIQS, Ford Q1, Stellantis Supplier Quality. These override the standard PPAP manual where they conflict.

18 Part Submission Warrant (PSW)

The cover document — supplier's declaration that the submitted parts meet all requirements and the package is complete. Signed by authorised supplier representative. No PPAP is valid without a signed PSW. This is Element 18 and the final gating document.

Element type: Documentation / records Risk analysis (FMEA) Measurement & data Customer-specific

PPAP Submission Levels — What You Send vs What You Keep

The Level defines what is physically submitted to the customer. All 18 elements must be generated and retained at the supplier site regardless of level.

Level	What is Submitted to Customer	When Used
1	PSW only (warrant only, no data)	Non-critical, commodity parts; customer waives data submission
2	PSW + limited supporting data + samples	Low-risk parts; customer selects specific elements to review
3	PSW + complete data package (all 18 elements)	Default level — used unless customer specifies otherwise
4	PSW + other requirements as defined by customer	Customer specifies exactly what additional data is required beyond PSW
5	PSW + complete package reviewed at supplier's manufacturing site	New suppliers, new processes, high-risk parts — customer sends team to supplier

📊 Cpk Requirements

Characteristic	Study Type	Min Cpk
Critical Characteristic (CC)	Initial PPAP	≥ 1.67
CC / SC	Ongoing production	≥ 1.33
Below target	Any	100% inspect

⚠️ 90-Day Change Rule

All changes require minimum 90 days advance notice and written approval before implementation.

A new PPAP with PSW is required before serial production resumes after any approved change.

Triggers: manufacturing location change · material change · design change · tooling inactive 12+ months · sub-supplier change

🔗 APQP — What Feeds the PPAP Package

PPAP is the output; APQP is the process that generates it. These APQP deliverables directly populate the 18 elements:

▸ Process Flow → Element 5

▸ DFMEA → Element 4

▸ PFMEA → Element 6

▸ Control Plan (3 phases) → Element 7

▸ MSA / GR&R → Element 8

▸ Initial Cpk studies → Element 11

Special Characteristics — CC / SC / IC

Must appear on all supplier Process Flow Diagrams, FMEAs, and Control Plans. Identified by symbols on engineering drawings.

CC — Critical

Critical Characteristic

Affects government regulation compliance or safety. Any deviation could endanger the end user.

REQUIRED

Process performance studies + ongoing monitoring per Control Plan
Cpk > 1.33 (initial: 1.67) or 100% inspection

RECOMMENDED

100% automatic control + poka-yoke + SPC

SC — Significant

Significant Characteristic

Important for customer satisfaction. Affects fit, functionality, durability, or processing.

REQUIRED

Process performance studies + ongoing monitoring
Cpk > 1.33

RECOMMENDED

100% automatic control + poka-yoke + SPC

IC — Important

Important Characteristic

Identified by expert knowledge as important product/process parameter for quality performance.

REQUIRED

Process performance studies at initial and subsequent part submissions only

8D Problem Solving

Structured 8-step approach — find and eliminate the systemic weakness that allowed the problem to occur, not just fix the symptom.

D1 · Within 24h

Problem Description & Team

Define the problem with data. Assemble cross-functional team with relevant expertise. Launch immediately.

⏰ 24 hours from initial complaint

D2 · Within 24h

Problem Definition

Quantify with data. Is/Is-Not analysis. Define what is wrong, where, when, how much.

⏰ 24 hours from complaint

D3 · Within 48h — HARD DEADLINE

Containment Actions

Protect the customer immediately. Document D3 actions and verify their effectiveness.

⏰ 48 hours — non-negotiable

D4 · Within 10 working days

Root Cause Analysis

Identify root cause for occurrence AND non-detection. Use 5-Why, fishbone, fault tree.

⏰ 10 working days

D5 · Within 10 working days

Define Corrective Actions

Select best permanent corrective action. Define implementation plan with owners & dates.

⏰ 10 working days

D6 · Within 30 working days

Implement & Verify

Confirm actions implemented. Provide evidence (photos, data, updated documents). Verify effectiveness with data.

⏰ 30 working days

D7 · Per D5 plan (≤90 days)

Prevent Recurrence

Update FMEAs, Control Plans, Process Flow, work instructions, training. Apply lessons to similar processes.

⏰ Typically ≤ 90 days total

D8 · Per D5 plan (≤90 days)

Official Closure

Confirm effectiveness, remove containment, officially close, recognize the team, file the report.

⏰ Official closure ≤ 90 days

⚡

Must communicate at D3, D5, and D8. When D8 takes >90 days, weekly reviews with the SQR are expected. Written response required for all chargebacks, even disputed ones.

Supplier Performance Evaluation

Expectation: zero (0) defects. Performance tracked across KPI categories for volume allocation, global expansion, and future business decisions.

Scorecard KPIs

Delivered Product Quality (PPM)Target: 0

Delivery Schedule PerformanceTarget: 100%

8D On-Time CompletionRequired

QMS Certification LevelIATF Target

PPAP On-Time Approval Rate≥ 98%

Response Time Requirements

Milestone	Deadline	Deliverable
Initial response	24 hours	Problem description + team launch
D3 Containment	48 hours	Containment actions confirmed in place
D5 Root Cause	10 working days	Root cause + corrective action plan
D6 Implementation	30 working days	Actions confirmed + supporting evidence
D8 Closure	≤ 90 days	Official 8D closed & filed

🚨 Escalation Model

Normal

NCR Tracking

Non-conformances tracked, action plans monitored.

Elevated

Increased Oversight

Weekly reviews, SQR direct involvement.

Critical

Special Status

Customer notification, audit scheduled.

Severe

Business Hold

No new business awards. Potential disqualification.

Glossary

Structured methodology defining steps to ensure products satisfy customers. Covers design/development, process design, product/process validation, and feedback/corrective action.

Defines requirements for production part approval including bulk materials. Determines customer requirements are understood and the process can consistently produce conforming product at the quoted rate.

Authorizes serial production. Contains supplier/part info, required documentation, and disposition. An approved PSW is required before the first serial production shipment.

Proactive risk management tool. Identifies potential failure modes, their effects, and causes. DFMEA required when supplier owns product design. PFMEA covers process failures. RPN = Severity × Occurrence × Detection.

SNCR (Supplier Non-Conformance Report) issued when plant receives out-of-spec material — triggers an 8D. SCB (Supplier Charge Back) recovers costs: extra freight, line stoppages, rework, sort, scrap, travel, recalls.

Suppliers must screen ECHA publications at least twice per year. Submit Article 33 information to customers if products contain SVHC above 0.1% w/w. Safety Data Sheets required per Art. 31 EU REACH Regulation.

Inventory practice ensuring oldest stock shipped first. Prevents obsolete material reaching the customer. Mandatory for all suppliers. Shelf-life limits must be monitored and respected at all times.

ISO 9001:2015 — The Complete Quality Management System Standard

ISO 9001:2015 is the world's most widely adopted quality management system standard. It moved from a 23-element prescriptive model (2008) to a risk-based, process-driven framework built on Annex SL's High Level Structure, enabling integration with ISO 14001, ISO 45001, and other management system standards.

ISO 9001 Revision History

Year	Edition	Key change
1987	1st issue	First international QMS standard — prescriptive 20-element model
1994	2nd issue	Minor updates, maintained 20-element structure
2000	3rd issue	Major restructure — process approach introduced, 8 sections
2008	4th issue	Clarifications only — no new requirements added
2015	5th issue (current)	Annex SL structure, risk-based thinking, no Quality Manual required, no Management Representative, no Preventive Action clause

Key Changes: 2008 → 2015

ISO 9001:2008 term	ISO 9001:2015 term
Products	Products and services
Documentation / Records	Documented information
Work environment	Environment for the operation of processes
Purchased product	Externally provided products and services
Supplier	External provider

✓

Annex SL High Level Structure — identical clause numbering across all ISO management system standards (14001, 45001, 13485, etc.) enabling integrated management systems.

✓

Risk-Based Thinking — replaces the old "Preventive Action" clause. Risk is now embedded throughout planning (§6.1) and operations.

✗

No Quality Manual required — organisations may choose to maintain one, but §4.3 scope documentation replaces the mandatory manual.

✗

No Management Representative — responsibility for QMS is now part of top management's role, not a delegated position.

✗

No Exclusion Clause — §4.3 requires justification for any non-applicable requirements rather than allowing simple exclusions.

ISO 9001:2015 — 10-Section Structure (PDCA)

📐 ISO 9001:2015 Structure — PDCA Mapping

Document Control (ISO 9001:2015 §7.5) & Configuration Management (ISO 10007)

Documentation Hierarchy

Level	Document type	Contains	ISO 9001:2015
1	Quality Manual	System overview, scope, policy	No longer mandatory
2	Procedures	High-level process overview — multi-discipline, no detailed "how"	"Documented information"
3	Work Instructions	Step-by-step "how the work is done"	Retain as evidence
4	Forms / Records	Empty = document; filled = record	Protected from alteration

💡

§7.5.3 requires documented information to be: available and suitable for use when needed, and adequately protected. Control activities include distribution, version control, storage, retention, and disposition.

Configuration Management (ISO 10007:2017)

Configuration management ensures product integrity over time by systematically controlling changes to the interrelated functional and physical characteristics of a product.

Step	Activity
Identification	Define and label all configuration items (part numbers, revision levels)
Change Control	Formal review and approval before any change is implemented
Status Accounting	Record and report on the current state of all configuration items
Audit	Verify actual product matches documented configuration baseline

📋

Example: Product version A = Part A rev 0 + Part B rev 1 + Part C rev 7. Version B = Part A rev 0 + Part B rev 2 + Part C rev 7. Change control ensures version B is formally released before production switches.

ISO 9001 Certification Chain

📐 The Three-Tier Certification Chain

Core vs Support Processes

Core Processes

Processes that must be performed and have significant direct impact on the organisation's success and ability to meet customer requirements.

Examples: order processing, product design, manufacturing, delivery, customer service

Process Approach (ISO 9001:2015 §4.4)

ISO 9001:2015 explicitly requires a process approach. Processes are defined by their inputs, outputs, interrelationships, and alignment with the strategic plan.

INPUT → PROCESS → OUTPUT
↑_______________________↑
      Feedback loop

Quality Audits — Complete Reference

ISO 19011:2018 provides guidelines for auditing management systems. Audits are systematic, independent, documented processes for obtaining evidence and evaluating it objectively to determine the extent to which audit criteria are fulfilled.

Audit Types — Two Classification Systems

By Scope

Type	Scope	Purpose
System Audit	Comprehensive — multiple processes and their interactions	Overall QMS conformance
Process Audit	One specific process, activity, or function	Compare actual process to documented requirements
Product Audit	A specific product or batch	Assess "fitness for use" — does product meet design requirements?

By Party

Party	Conducted by	When
1st Party	Internal — organisation audits itself. Auditors have no vested interest in the area audited.	Ongoing internal improvement
2nd Party	Customer — audits its supplier before or after awarding a contract	Supplier qualification, surveillance
3rd Party	Independent audit organisation — free from any conflict of interest in the customer-supplier relationship	ISO certification, regulatory compliance

Special type	Description
Registration Audit	Third-party audit to obtain ISO 9001 (or other standard) certification
Compliance Audit	Confirms conformance to a specific standard or procedure. Differs from improvement audits — focuses on evidence of conformance, not performance improvement.

Audit Participants — Roles & Responsibilities

Role	Definition	Key responsibilities
Client	Organisation or person requesting the audit	Initiates audit · Defines purpose and scope · Provides resources · Receives report · Determines distribution · Decides on actions
Lead Auditor	Auditor responsible for leading the audit team	Develops and communicates audit plan · Assigns roles · Chairs opening and closing meetings · Ensures team stays on track · Issues report and follow-up
Auditor	Person who conducts the audit	Understands purpose and scope · Plans audit · Collects and analyses evidence · Reports findings · Follows up actions
Auditee	Organisation or individual being audited	Informs staff · Provides resources and escorts · Shows objective evidence · Cooperates · Determines and initiates corrective actions
Technical Expert	Person who provides specific knowledge or expertise to the audit team	Supports auditors with specialist knowledge — not an auditor themselves
Observer	Accompanies the audit team but does not audit	May be a trainee auditor or a regulatory observer — no active role in the audit
Guide	Person appointed by the auditee to assist the audit team	Facilitates access, escorts, helps with logistics — does not influence audit findings

The Audit Process — Six Stages

📐 Audit Process Flow (ISO 19011:2018)

Audit Report — 7 Quality Characteristics

🎯

Accurate

Free from errors and distortions — purpose clearly communicated

⚖️

Objective

Fair, impartial, and unbiased — evidence-based conclusions only

💡

Clear

Easy to understand, logical flow — no ambiguous language

✂️

Concise

Straight to the point — no unnecessary detail or padding

🔧

Constructive

Helps the client improve — practical, actionable recommendations

📋

Complete

Includes all relevant facts — nothing important omitted

⏱️

Timely

Well-timed to enable decisions on recommendations — not delayed

Follow-up Actions

Correction — fix the immediate problem
Corrective Action — eliminate the root cause
Preventive Action — prevent potential future issues
Effectiveness is verified, possibly in a subsequent audit.

Cost of Quality & Quality Training

Cost of Quality — The Four Categories

Management understands the language of money. Quantifying the cost of quality justifies spending on prevention and improvement activities, and sets measurable targets. Every pound/dollar spent on prevention reduces the much larger internal and external failure costs.

✅ Prevention Cost — Doing it Right

Money spent to prevent defects from occurring in the first place. The highest ROI category — every £1 spent on prevention saves £10–£100 in failure costs.

• Quality planning and system development
• Education and training (SPC, FMEA, statistical methods)
• Design reviews and FMEA
• Supplier reviews and qualification
• Quality system audits
• Process planning and capability studies

🔍 Appraisal Cost — Finding Defects

Money spent on inspecting and testing to detect defects. Necessary but non-value-adding — the goal is to reduce the need for appraisal by improving prevention.

• Test and inspection (receiving, in-process, final)
• Supplier acceptance sampling
• Product audits
• Calibration of measurement equipment

⚠️ Internal Failure Cost — Found Before Shipping

Cost of defects discovered before the product reaches the customer. Painful but preferable to external failures.

• In-process scrap and rework
• Troubleshooting and repair
• Design changes caused by quality problems
• Extra inventory to buffer poor yields
• Re-inspection and retest of reworked items
• Downgrading (selling at lower price)

🔥 External Failure Cost — Found by Customer

The most expensive category — defects discovered after delivery. Includes not just direct costs but reputational damage and lost future business.

• Sales returns and allowances
• Service level agreement penalties
• Complaint handling and investigation
• Warranty field labour and parts
• Recalls
• Legal claims and litigation
• Lost customers and business opportunities

💡

Visible COPQ (above the waterline): rejection, rework, repair, inspection costs — easily measured

Invisible COPQ (iceberg below waterline): lost sales, excess inventory, additional controls and procedures, complaint investigation, legal fees, customer dissatisfaction — hard to quantify but often much larger

Optimum Quality Cost Model

Traditional Model (Older View)

Assumed that improving quality beyond a certain level leads to increasing costs — there was an "optimal" defect rate where prevention + appraisal costs balanced failure costs. This model suggested that 100% quality was too expensive.

Modern Model (Current View)

Quality improvement consistently leads to cost reduction — there is no point of diminishing returns. Higher quality means fewer failures, less rework, less inspection, less warranty. Crosby's "Quality is Free" thesis is supported by this model.

Quality Training — ADDIE Model

The ADDIE model is the standard instructional design framework for developing quality training programmes. It provides a systematic approach to ensure training is effective, relevant, and measurable.

ANALYSE

Learning environment, learners' existing knowledge, needs analysis, gap assessment

DESIGN

Learning objectives, exercises, content structure, lesson planning, media selection

DEVELOP

Create and assemble the content, materials, and resources

IMPLEMENT

Deliver the curriculum — method of delivery, testing procedures, actual training

EVALUATE

Collect feedback, measure outcomes, refine the programme

Kirkpatrick Model — 4 Levels of Training Effectiveness

Donald Kirkpatrick's four-level model (1959, still the industry standard) provides a framework for evaluating whether training actually achieves its intended purpose. Levels build on each other — you must satisfy Level 1 before Level 2 matters, and so on.

Level	Name	What is measured	How measured	Quality context
1	Reaction	The degree to which participants find the training favourable, engaging, and relevant to their jobs	Post-training surveys, smile sheets, immediate feedback forms	Did quality engineers find the SPC training useful and applicable to their work?
2	Learning	The degree to which participants acquired the intended knowledge, skills, attitude, confidence, and commitment	Pre/post knowledge tests, skill demonstrations, simulations	Can engineers now correctly calculate Cpk and interpret control chart signals?
3	Behaviour	The degree to which participants apply what they learned when back on the job	Observation on the job, supervisor assessments, 90-day follow-up	Are engineers actually using SPC charts and reacting to out-of-control signals?
4	Results	The degree to which targeted outcomes occur as a result of the training and the support package	Business metrics — scrap rate, Cpk improvement, DPMO reduction, COPQ reduction	Has the quality of shipped products improved as a result of the SPC training programme?

💡

Most organisations only measure Level 1 (satisfaction surveys) and stop there. True training effectiveness requires measuring Level 4 business results — which is the only way to justify the training investment. For quality engineers, the ROI metric is usually COPQ reduction.

Product & Process Control — Material, Nonconformance & HACCP

Section IV of quality engineers Body of Knowledge covers the practical controls applied during production — from hazard analysis through material identification, segregation, nonconformance handling, and corrective action.

Documentation Hierarchy — Quality System Pyramid

📐 Quality System Documentation Levels

Quality Manual

System overview, scope, policy. Not mandatory under ISO 9001:2015 but still widely used.

Procedures

High-level process overview — multi-discipline, does not include detailed "how". Answers WHAT and WHO.

SOPs & Work Instructions

Step-by-step detail of how work is performed. SOPs describe a process; WIs describe a task within a process.

Records

Empty form = document. Filled-in form = record. Records provide evidence of compliance and must be protected from unintended alteration.

HACCP — Hazard Analysis Critical Control Point

HACCP is a systematic preventive approach to food safety. It identifies physical, chemical, and biological hazards in production processes and establishes key limits to reduce these risks. The underlying goal: preventing problems from occurring is better than correcting them after the fact. The term "Critical Control Point" (CCP) is widely borrowed beyond food — it refers to any point where failure of the SOP could cause harm to customers or the business.

#	HACCP Principle	What it means
1	Hazard Analysis	Identify all potential hazards (biological, chemical, physical) at each process step
2	CCP Identification	Determine which steps are Critical Control Points — where control is essential to prevent/eliminate a hazard
3	Critical Limits	Establish the maximum/minimum values (e.g. minimum cooking temperature) that must be met at each CCP
4	Monitoring Procedures	Define how and how often each CCP will be monitored to ensure critical limits are met
5	Corrective Actions	Specify actions to take when monitoring indicates a CCP is not under control
6	Verification Procedures	Confirm the HACCP system is working effectively — audits, testing, record reviews
7	Record Keeping	Maintain documentation of monitoring, deviations, corrective actions, and verification activities

CCP Examples (food industry)

🌡️ Thermal processing — cooking temperature/time
❄️ Chilling — storage temperature control
🧪 Testing ingredients for chemical residues
⚖️ Product formulation control
🔩 Testing product for metal contaminants

💡

A CCP is the "stop sign" of the process — the point where if the control fails, the hazard reaches the customer. Not every process step is a CCP; only those where control is critical to safety or product integrity.

Material Identification, Status & Traceability (ISO 9001:2015 §8.5.2)

Identification

Ability to determine that the specified material grade and size are being used at every stage.

PMI (Positive Material Identification) — mandatory physical test for critical materials (e.g. alloy verification for pressure vessels, pipelines)

Status

Material must be clearly labelled with its current disposition status:

✅ APPROVED — cleared for use

⏳ QUARANTINE — awaiting decision

✗ REJECTED — do not use

Traceability (ISO 9000:2015)

Ability to identify a specific item throughout its life and link it to its Mill Test Report (MTR). Covers: origin of materials and parts, processing history, distribution and location after delivery.

ISO 9001:2015 §8.5.2:
Organisation shall control unique identification of outputs when traceability is required, and retain documented information to enable traceability.

Material Segregation & Classification

Material Segregation

Physical separation of materials to prevent mixing, cross-contamination, or unintended use. Key segregation categories:

✓ Pass / Fail separation at inspection
⏳ Quarantine area — material pending review decision
🏷️ Different material classes (e.g. Carbon Steel vs Stainless Steel — must never mix)

Material Classification — Defect vs Nonconformity

Term	ISO 9000:2015 definition
Nonconformity	Non-fulfilment of a requirement. Broader term — includes any deviation from spec, process, or standard.
Defect	Nonconformity related to an intended or specified use. Defects adversely affect the functionality of the product. All defects are nonconformities, but not all nonconformities are defects.

💡

Use "nonconformity" in contractual/legal contexts (safer). Use "defect" only when the functionality impact is confirmed.

Nonconforming Outputs — ISO 9001:2015 §8.7

§8.7 requires that nonconforming outputs be identified and controlled to prevent unintended use or delivery. The organisation must take action based on the nature and effect of the nonconformity — including after delivery.

§8.7 Disposition option	What it means
a) Correction	Rework, repair, or reprocess to make the output conform
b) Segregation / Containment / Return / Suspend	Physically separate, return to supplier, or stop provision of service
c) Inform the customer	Notify the customer that nonconforming product may have been delivered
d) Accept under concession	Release with customer or relevant authority authorisation — documented deviation

💡

After correction, conformity must be re-verified before release. All dispositions must be documented (retain the documented information).

Corrective Action — ISO 9001:2015 §10.2

When a nonconformity occurs (including a complaint), the organisation must react and take corrective action to eliminate the root cause:

Step	Requirement
a)	React to the nonconformity — contain, correct immediately
b)	Evaluate need to eliminate root cause(s) — to prevent recurrence
c)	Implement any needed action
d)	Review the effectiveness of the corrective action taken
e)	Update risks and opportunities if necessary
f)	Make changes to the QMS if necessary

💡

Correction vs Corrective Action: Correction fixes the immediate problem (rework). Corrective Action eliminates the root cause (process change) to prevent it recurring. Only CA prevents future occurrences.

Corrective Action Process — Problem Solving Steps

Step	Activity
1. Problem Identification	Define and quantify the problem clearly — what, where, when, how often, how much
2. Failure Analysis	Analyse the failure — what failed and how. Reproduce the failure if possible.
3. Root Cause Analysis	Identify the true root cause — use 5-Why, Fishbone, or fault tree. Address the system, not just the symptom.
4. Problem Correction	Implement the corrective action — change the process, design, procedure, or training to eliminate the root cause
5. Recurrence Control	Implement controls to prevent recurrence — update FMEA, control plan, WI, training records
6. Verification of Effectiveness	Confirm the CA worked — monitor KPIs, check DPMO, audit the new process. Close only when effectiveness is confirmed.

Preventive Action Tools

🔒 Error proofing / Poka-Yoke
🛡️ Robust Design (Taguchi parameter design)
📋 QMS — ISO 9001:2015
📊 FMEA — proactive risk identification
🏭 Lean thinking — 5S, standard work

💡

Correction vs CA vs PA (ISO 9000:2015): Correction = fix this defect now. Corrective Action = eliminate the cause so it doesn't recur. Preventive Action = eliminate the cause of a potential (not yet occurred) problem.

Seven Basic Quality Tools

Introduced by Kaoru Ishikawa in the 1960s, these seven tools form the foundation of quality problem-solving. All Quality Circle members are trained to use them. Together they move a team from raw data collection through root cause identification to ongoing process monitoring.

Check Sheet

A structured data-collection form used to manually tally and record the number of observations of specific events. It is the first tool applied — it creates the raw data that feeds every other tool.

When to use: At the start of any investigation. "What is happening, how often, and where?"

Key principle: Design the sheet before collecting data so it captures exactly what you need — category, time, location, shift.

Example — Water Bottle Manufacturing Defect Tally

Cause-and-Effect Diagram

Also called Fishbone or Ishikawa diagram. Graphically displays the relationship between an effect (the problem) and all possible causes, organised by the 6M categories.

6M Categories: Man · Machine · Method · Material · Measurement · Mother Nature (Environment)

Key principle: Qualitative tool — surfaces possible causes, not confirmed causes. Complement with data to validate. Invented by Kaoru Ishikawa (1943).

Fishbone Diagram — Water Bottle Fill Inconsistency

Histogram

A bar chart displaying the distribution of measurements — the bars touch (continuous data). Quickly reveals the centre, spread, and shape of the data, providing clues to reducing variation.

Shape patterns to watch: Normal (bell) · Skewed left/right · Bimodal (two peaks — mixing two processes) · Uniform · Comb (measurement resolution too coarse)

Key distinction: Bars touch = continuous data (histogram). Bars separate = categorical data (bar chart). Never confuse the two.

Four Common Shapes — What Each Means

Pareto Chart

Bars in descending order of magnitude with a cumulative percentage line. Based on the Pareto Principle (80/20 rule): approximately 80% of problems come from 20% of causes.

How to read it: Find where the cumulative % line crosses 80%. The bars to the left of that point are the "vital few" — address these first for maximum impact.

Pro tip: Use Stratification after Pareto — split the Pareto by machine, shift, or operator to reveal which sub-group is driving the top defect.

Pareto Chart — Water Bottle Defects (n=36)

Scatter Diagram

A plot of one variable against another on an X-Y graph. Reveals the strength and direction of a relationship between two variables. Leads into regression analysis in DMAIC Analyse phase.

⚠️ Correlation ≠ Causation. A strong scatter pattern shows a relationship exists — not that X causes Y. Always ask if a third variable could be driving both.

5 patterns: Strong positive · Weak positive · No relationship · Weak negative · Strong negative

Scatter — Hours Studied vs Test Score (%)

Control Chart

A line graph of measurements over time with statistically derived UCL and LCL. The most powerful of the 7 tools. Distinguishes common cause from special cause variation — tells the operator when to act and when to leave the process alone.

The golden rule: Reacting to common cause variation (tampering) makes the process WORSE. Only investigate and act on special cause signals (points outside UCL/LCL, runs, trends).

Special cause signals: Point beyond 3σ · 7-in-a-row same side · 7-in-a-row trend · 2 of 3 in Zone A · Stratification / Hugging

X̄-Chart — Fill Volume (cc) with Special Cause Signal

Stratification

Breaking data down into meaningful sub-categories (machine, shift, material, operator, time period) so patterns that are hidden in the combined data become visible.

When to use: When a Pareto or histogram of combined data doesn't explain enough. Ask: "Is this data actually from the same process?" Often the answer is no.

Classic example: Combined Pareto shows "Loose Cap" as #1. After stratification by shift, Shift 2 is responsible for 80% of all loose cap defects — pinpointing where to focus.

Stratification — Combined vs Split by Bottle Size

Seven Management & Planning Tools

The Seven Management and Planning Tools (7MP / New Seven Tools) complement the Basic 7 by handling qualitative, language-based, and planning data. Where the Basic 7 analyse numbers, the 7MP tools organise ideas, reveal relationships, and plan complex activities. They are particularly powerful in the early stages of DMAIC (Define/Measure) and for strategic planning.

Affinity Diagram (KJ Method)

Organises a large number of ideas, opinions, or facts into natural groupings by affinity (similarity). Developed by Japanese anthropologist Kawakita Jiro (KJ). Ideal after brainstorming when you have 20–200+ ideas to make sense of.

Process: Write each idea on a separate sticky note → silently group related ones together → give each group a header card that captures the theme → discuss emerging patterns.

Affinity Diagram — "How to Pass quality engineers Exam"

Tree Diagram

Breaks down a broad goal into progressively finer levels of detail. Reveals all the activities, tasks, and sub-tasks that must be accomplished to achieve the objective. Also used to show hierarchical structures.

Input to PDPC: The tree diagram becomes the starting structure for a PDPC — the next tool then adds "what could go wrong" and countermeasures to each task node.

Tree Diagram — Passing quality engineers Exam

PDPC — Process Decision Program Chart

Identifies what could go wrong in a plan and develops countermeasures before problems occur. Similar to FMEA for project plans. Starts with a tree diagram and adds risk branches with labelled countermeasures: O = practical, X = impractical.

Key distinction from FMEA: PDPC is for project plans and new initiatives. FMEA is for product/process design. Both are proactive risk tools.

PDPC — Video Course Resource Planning

Matrix Diagram

Shows the relationship between two or more groups by arranging them in rows and columns with relationship symbols at intersections. Multiple shapes: L-shaped (2 groups), T-shaped (3 groups), Roof-shaped (1 group vs itself — used in House of Quality).

QFD connection: The House of Quality uses an L-shaped matrix (VOC vs engineering requirements) and a roof-shaped matrix (engineering requirement interactions). The roof is a matrix diagram.

L-Shaped Matrix — Product vs Customer Criteria (1=weak, 5=strong)

Interrelationship Digraph

Analyses cause-and-effect relationships between multiple factors in a complex situation. Unlike fishbone (one effect), the digraph handles multiple interconnected causes and effects simultaneously — ideal for chronic, systemic quality problems.

Reading the diagram: Count arrows in and out. Most outgoing arrows = root cause (driver). Most incoming arrows = key effect (outcome indicator). Focus improvement on root causes.

Node with most outgoing arrows is usually the best leverage point for change.

Interrelationship Digraph — Poor Quality (In/Out count shown)

Prioritisation Matrix

Compares and ranks choices against weighted criteria to select the best option objectively. Removes subjectivity from project selection, supplier choice, or design decisions. Each criterion has a weight (sum to 1.0), and each option is rated 1–5 against each criterion.

Formula per cell: Rating × Weight. Sum all weighted cells for each option — highest total wins. The weighting step is what differentiates this from a simple score matrix.

Prioritisation Matrix — Quality Improvement Project Selection

Activity Network Diagram (CPM / PERT)

Manages tasks in sequence to identify the critical path, bottlenecks, and float (slack). The Critical Path Method (CPM) finds the longest sequence of dependent tasks — delays on the critical path delay the entire project.

Float (Slack): Amount a task can slip without delaying the project. Activities on the critical path have zero float. Non-critical activities have positive float — they can slip without affecting the end date.

PERT extends CPM by using probabilistic time estimates: Expected Time = (Optimistic + 4×Most Likely + Pessimistic) / 6

Activity Network — Critical Path Highlighted (Red = Critical, Green = Float available)

Statistical Process Control

SPC is manufacturing's early-warning system — detecting real process shifts before they become defects, while distinguishing true signals from random noise.

Cp, Cpk, Pp, Ppk — The Capability Family

Capability indices answer two separate questions: "Can the process fit within spec?" (Cp) and "Is it actually centred there?" (Cpk). The gap between them is your centering loss.

📊 Centered (Cp = Cpk) vs Off-Center (Cp > Cpk) — Same Process Spread

Cp — Potential (short-term)

Cp = (USL − LSL) ÷ 6σ

Assumes perfect centering. Answers: can the process fit?

Cpk — Actual (centred?)

Cpk = min[(USL−µ)/3σ, (µ−LSL)/3σ]

Accounts for centering. Answers: is it centered there?

Pp — Long-term Potential

Pp = (USL − LSL) ÷ 6s

Uses sample std dev s (not σ). Long-term spread.

Ppk — Long-term Actual

Ppk = min[(USL−x̄)/3s, (x̄−LSL)/3s]

Overall performance including all variation sources.

Cpk Acceptance Thresholds

Cpk < 1.0

Incapable ✗

Cpk = 1.00

Marginal

Cpk = 1.33

Capable ✓

Cpk = 1.67

Highly Capable

Cpk ≥ 2.0

World Class

💡

Large Cp − Cpk gap? Fix centering first — not spread reduction. If Cp ≥ 1.33 but Cpk < 1.33, the process is capable of meeting spec but is running off-target. Adjust the mean before spending on variation reduction.

📋 Cpk Targets by Char. Type

Characteristic	Min Cpk	Initial
CC (Critical) ongoing	≥ 1.33	≥ 1.67
SC (Significant) ongoing	≥ 1.33	≥ 1.50
General process control	≥ 1.33	≥ 1.33

Cpk → Defect Relationship

Cpk	DPMO (approx)
1.00	2,700
1.33	63
1.50	6.8
1.67	0.57
2.00	0.002

Control Chart Selection Guide

The right chart depends on two things: data type (measured value vs pass/fail count) and subgroup size. Using the wrong chart gives misleading signals.

📊 Which Control Chart? — Decision Tree

X̄ chart monitors process mean (location); R chart monitors within-subgroup spread. Uses constants A₂, D₃, D₄ from standard tables. Best for rational subgroups of 2–8. Most common in PPAP control plans and IATF 16949 production environments.

Used when one measurement per cycle is all that's available: slow processes, destructive testing, chemical batches, daily lab results. Less sensitive to small shifts than X̄–R. Moving Range tracks point-to-point variation.

p-chart: proportion defective (variable subgroup size). np-chart: count defective (constant n). Both use the binomial distribution. Foundation of attribute acceptance sampling plans.

c-chart: total defect count per unit (constant inspection area). u-chart: defects per unit (variable inspection area). Both based on the Poisson distribution. Examples: scratches per panel, solder defects per board, paint runs per door.

Western Electric / Nelson Out-of-Control Rules

These 8 patterns on a control chart each indicate a special cause of variation — something changed in the process. Any single rule triggering is sufficient grounds for investigation. Each chart below shows the rule in isolation with real-scale control chart zones.

📊 Rule 1 — 1 Point Beyond ±3σ Sudden shift / special event / measurement error

📊 Rule 2 — 9 Consecutive Points Same Side of Centreline Process mean shift / new material lot / operator change

📊 Rule 3 — 6 Consecutive Points Trending Up or Down Tool wear / gradual drift / raw material degradation

📊 Rule 4 — 14 Consecutive Points Alternating Up/Down Two processes alternating / overadjustment / tampering

📊 Rule 5 — 2 of 3 Consecutive Points Beyond ±2σ (Same Side) Process shift starting / incoming material lot change

📊 Rule 6 — 4 of 5 Consecutive Points Beyond ±1σ (Same Side) Systematic bias / gradual process drift

📊 Rule 7 — 15 Consecutive Points Within ±1σ Stratification / incorrect subgrouping / mixed streams

📊 Rule 8 — 8 Consecutive Points Beyond ±1σ (Both Sides) Mixture of two different process distributions / bimodal output

#	Rule	Signal Condition	What it usually means
1	Beyond ±3σ	1 point outside control limits	Sudden shift, special event, measurement error
2	9 same side	9 consecutive points all above or all below CL	Process mean shift, new lot, operator change
3	6 trend	6 consecutive points all increasing or all decreasing	Tool wear, gradual drift, raw material degradation
4	14 alternating	14 consecutive points alternating up/down	Two processes alternating, overadjustment/tampering
5	2 of 3 beyond ±2σ	2 of 3 consecutive beyond ±2σ same side	Process shift starting, material lot change
6	4 of 5 beyond ±1σ	4 of 5 consecutive beyond ±1σ same side	Systematic bias, gradual drift
7	15 within ±1σ	15 consecutive points all within ±1σ of CL	Stratification — mixed streams in subgroups, limits too wide
8	8 beyond ±1σ both sides	8 consecutive points outside ±1σ (above and below)	Bimodal / mixture of two process distributions

💡

Common cause vs special cause. Control charts separate random noise (common cause — inherent system variation) from assignable events (special cause — investigate and fix). Reacting to common cause variation is tampering — it adds variation. Rules 1 and 2 are the most practically important: use them always. Rules 5–8 add sensitivity but also false alarms — apply them when the cost of missing a shift is high.

Capability Analysis — The Complete Framework

Capability analysis answers a single fundamental question: can this process reliably produce output that meets customer requirements? It does so by fitting a statistical model to process data and estimating the probability of producing nonconforming product — now and in the future. Before any capability number is trustworthy, however, three conditions must hold: the process must be stable, the data must be approximately normal, and there must be enough observations for the statistics to carry real precision. Failing any one of these makes the resulting Cpk figure meaningless.

📐

Two types of capability study: A single-variable capability analysis evaluates one CTQ characteristic against its specification limits. A before/after capability comparison determines whether a process improvement project produced a measurable, statistically confirmed improvement in capability — not just noise.

📊 Three Prerequisites Before Computing Any Capability Index

① Process Stability — The First Gate

Capability statistics estimate a future defect rate, not just a historical snapshot. That projection is only valid if the process is operating in a stable, predictable state — meaning only common-cause variation is present and no special causes are inflating or shifting the output. A capability study on an unstable process produces a number that describes a process that no longer exists.

The eight Western Electric stability tests are available for variables control charts, but using all eight simultaneously drives up the false-alarm rate. Research comparing sensitivity and false-alarm behaviour identified three tests that give the best balance for capability pre-screening:

TEST 1 — Always Used

Point Beyond Control Limits

Signals when any single point lies more than 3 standard deviations from the centreline. Universally recognised as the primary out-of-control signal. False alarm rate: 0.27% — the baseline for all other tests. Applied to all chart types: I-MR, X̄-R/S.

Signal: 1 point > ±3σ from CL · FAR=0.27%

TEST 2 — Detects Mean Shifts

9 Consecutive Points, One Side

Signals when 9 successive points all fall on the same side of the centreline. Simulation showed that combining Test 2 with Test 1 reduces the average subgroups needed to detect a 0.5σ mean shift from 154 to just 57 — a 63% improvement in detection speed. Applied to I-chart and X̄-chart only.

Signal: 9 pts same side · Detects small shifts

TEST 7 — Detects Stratification

12–15 Points Within ±1σ

Signals when an unusual number of consecutive points cluster within ±1σ of the centreline — the opposite of what Test 1 catches. This pattern reveals stratification: multiple distinct populations mixed into a single subgroup (e.g. two machines sampled together). Used only on the X̄-chart when limits are estimated from data.

k = subgroups × 0.33 · min 12, max 15 pts

k = (Subgroups × 0.33)	Points in a row required for Test 7 signal	What it means
k < 12	12 points	Use fixed minimum — too few subgroups for adaptive rule
12 ≤ k ≤ 15	Integer ≥ k	Adaptive: scale with data volume for balanced sensitivity
k > 15	15 points	Cap at maximum — prevents excessive false alarms with large datasets

⚠️

Tests 3, 4, 5, 6, and 8 are excluded from pre-capability screening. Tests 3 (trends) and 4 (alternating) add no unique detection power over Tests 1+2. Tests 5, 6, and 8 don't isolate special cause patterns common enough to justify their false-alarm cost. For the R, S, and MR charts (spread charts), only Test 1 is applied — extreme spread points are the only practically relevant signal.

② Normality Testing — The Anderson-Darling Approach

Standard capability indices (Cp, Cpk, Pp, Ppk) are derived from the normal distribution. They convert a Z-score — the number of standard deviations between the process mean and the nearest specification limit — into a defect probability using the normal CDF. If the process data doesn't follow a normal distribution, those Z-to-DPMO conversions are wrong, and every capability index based on them is wrong.

The Anderson-Darling (AD) test is the preferred normality test for capability pre-screening. Compared to other goodness-of-fit tests, the AD test has higher statistical power — especially in the tails of the distribution, which is precisely where capability defects occur. The concern that the AD test becomes overly strict with large samples is not supported by simulation evidence: across sample sizes from 500 to 10,000, and across normal populations with varying spreads, the Type I error rate consistently tracks the target significance level (≈5% at α=0.05).

📊 Normality Assessment Decision Flow

AD Test — Why it beats alternatives

The AD test accumulates squared deviations between the empirical CDF and the theoretical normal CDF with extra weight given to the tails. Since nearly all capability defects occur in the tails, this weighting is exactly what's needed. The Kolmogorov-Smirnov test applies equal weight throughout the distribution — it can miss tail problems that dominate capability.

Box-Cox Transform — When normality fails

The Box-Cox power transformation x → (xλ−1)/λ can often convert a moderately skewed distribution into an approximately normal one. The optimal λ is found by maximum likelihood. Once transformed data passes the AD test, capability indices are computed on the transformed scale. The Cpk result describes performance on the original scale after back-transformation.

AD Test Simulation Evidence — Type I Error & Power

Extensive simulation work confirmed two important properties of the AD test for capability analysis contexts:

Property	What was tested	Result	Practical meaning
Type I Error	5,000 samples from normal populations (σ=0.1 to 70) at n=500 to 10,000	≈ 5% rejection rate at α=0.05 — consistent across all sample sizes and dispersions	The AD test does not become overly strict with large datasets — a common practitioner concern that simulation disproved
Power (correct rejection)	5,000 samples from 17 non-normal distributions (t, Laplace, Uniform, Beta, Gamma, Weibull, etc.)	≈ 100% rejection for nearly all distributions at n≥500	If your data isn't normal, the AD test will detect it — with one exception
Power exception	Beta(3,3) at n<1000; Weibull(4,4) at n<3000	Power < 100% for small samples	These distributions are visually indistinguishable from normal — a normal capability model provides a good approximation and produces reliable estimates

③ How Much Data Do You Actually Need?

The required sample size depends on two things: the true capability of your process and the precision you need from the estimate. These are connected — at high sigma levels, even rough estimates of Z (±15%) translate into a range of DPMO values that is practically acceptable. At lower sigma levels, the same ±15% range spans thousands of DPMO, which may be unacceptable for decision-making.

The AIAG SPC reference manual recommends at least 25 rational subgroups and a minimum of 100 total measurements. Independent simulation work generating 10,000 benchmark-Z estimates at each sample size confirmed this guidance:

Confidence Level	Precision margin	Target Z > 3 (typical capable process)	Target Z ≈ 2.5 (marginal process)
90%	±15% of true Z	~100 observations	~103 observations
90%	±10% of true Z	~175 observations	~215 observations
90%	±5% of true Z	~650 observations	~750 observations
95%	±15% of true Z	~150 observations	~175 observations
95%	±10% of true Z	~200 observations	~250 observations

💡

The 100-observation rule explained: With 100 measurements from a process where Z>3, you can be 90% confident that your computed benchmark Z lies within ±15% of the true Z. For a truly 6σ process (Z=4.5 long-term), that confidence interval spans roughly Z=3.8 to Z=5.2. Doubling to 175 observations tightens this to ±10%. For most industrial go/no-go decisions, 100 measurements is sufficient; for precise capability reporting in supply chain audits, target 175+.

Why Precision Matters More at Lower Sigma Levels

True Z	True DPMO	±15% precision → Z range	DPMO range at ±15%	Practical impact
4.5σ	3.4	3.83 – 5.18	0.9 – 13.3 DPMO	Acceptable — the difference between 1 and 13 defects per million is rarely decision-critical
3.0σ	1,350	2.55 – 3.45	280 – 5,330 DPMO	Significant — a 19× DPMO range makes pass/fail decisions unreliable
2.5σ	6,210	2.13 – 2.88	1,970 – 16,400 DPMO	Unacceptable for reporting — increase sample size to ≥200 before drawing conclusions

The Recommended Capability Study Sequence

1
Define CTQ and specification limits
Confirm LSL/USL are customer-driven, not internally tightened. Incorrect spec limits make all downstream analysis meaningless.
2
Validate the measurement system (GR&R)
If the gauge R&R exceeds 30% of process variation, the capability index will be systematically underestimated. Fix measurement before measuring capability.
3
Plot an I-MR or X̄-R/S control chart
Run Tests 1, 2, and 7 (for X̄). Remove special causes before continuing. Do not compute Cpk from an unstable process.
4
Test for normality with Anderson-Darling
If p<0.05, attempt Box-Cox transformation. If transformation fails, use non-normal capability methods (e.g. Weibull capability, non-parametric percentile approach).
5
Collect at least 100 observations
Fewer than 30: do not report Cpk. 30–99: flag as preliminary. 100+: acceptable for capability reporting. 175+: preferred for formal PPAP submission.
6
Compute and report Cp, Cpk, Pp, Ppk
Report confidence intervals alongside the point estimates. A Cpk of 1.35 with a 95% CI of [1.10, 1.62] tells a very different story than just "1.35".
7
Interpret with AIAG thresholds — but don't stop there
Cpk ≥ 1.67 (initial CC), Cpk ≥ 1.33 (ongoing). Always pair the capability index with a probability plot, histogram overlay, and DPMO estimate. Never report a number without context.

Before/After Capability Comparison — Verifying Improvement

A before/after capability comparison is used at the end of a DMAIC Improve phase to confirm that an improvement action produced a real, statistically significant improvement — not a random fluctuation. The same three prerequisites apply to both datasets independently. Key considerations:

Valid comparison requires

✓ Both datasets from stable processes (independently verified)
✓ Both datasets passing normality (or same transformation applied)
✓ Minimum 100 observations in each group
✓ Same measurement system used for both (GR&R unchanged)
✓ Statistical significance test on Cpk difference (use non-central F)

Common before/after errors

✗ "After" data collected during unstable trial run (Hawthorne effect)
✗ Sample sizes too small to detect a meaningful Cpk improvement
✗ Gauge R&R changed between before and after studies
✗ Declaring success from point estimates alone — use confidence intervals
✗ Not waiting long enough for the "after" data to represent steady-state

✅

Summary rule: Stability → Normality → Sufficient data. In that order, with no shortcuts. A Cpk computed without verifying all three is a number without a foundation. Compute it if you must, but flag it clearly as unvalidated and treat it as indicative only — never as a basis for a PPAP approval or a customer capability commitment.

Nelson Rules — All 8 Rules with Probabilities & Causes

Nelson Rules (also called Western Electric / Shewhart Rules) detect special cause variation. Each rule has a known false-alarm probability — this is the probability it triggers even when the process is in statistical control (i.e., common cause variation only).

#	Pattern	False alarm probability	Probable special cause
1	1 point more than 3σ from centreline	(1−0.9973) = 0.0027	New operator, wrong setup, measurement error, out-of-spec material
2	7 points in a row on the same side of the centreline	(0.5)⁷ = 0.0078	Process mean has shifted — setup change, tool wear, material batch change
3	7 points in a row all increasing or all decreasing	≈ 0.0017	Trend — tool wear, gradual deterioration, temperature drift
4	14 points in a row alternating up and down	≈ 0.0002	Over-control / tampering — operator adjusting too frequently
5	2 out of 3 consecutive points more than 2σ from centreline (same side)	≈ 0.003	New operator, wrong setup — similar to Rule 1 but detects smaller shifts
6	4 out of 5 consecutive points more than 1σ from centreline (same side)	≈ 0.005	Small sustained shift in process mean
7	14 points in a row within 1σ from centreline (either side)	(0.68)¹⁴ = 0.0045	Process improvement, reduced variation, or stratified sampling mixing two distributions
8	8 points in a row more than 1σ from centreline (either side)	(1−0.68)⁸ = 0.0001	Mixture of two processes — two machines, two shifts, or two operators being combined

Zone Labels (σ bands)

The control chart is divided into zones from the centreline outward:

Zone C — within 1σ of centreline (≈68% of points here)
Zone B — between 1σ and 2σ from centreline (≈27%)
Zone A — between 2σ and 3σ from centreline (≈4.3%)
Beyond 3σ — outside control limits (≈0.27%)

Practitioner Tips

✓ Rule 1 (beyond 3σ) is always the most obvious — the simplest special cause signal
✓ Rule 2 (7-in-a-row same side) is the most common exam scenario — a mean shift
✓ Rule 4 (alternating 14 points) = over-control. The fix is to stop adjusting.
✓ Rule 7 (hugging centreline) = artificially low variation, often from stratified subgroups mixing two processes
✓ False alarm rate multiplies with each rule added — more rules = more false alarms

Process Capability Indices — Complete Reference

Capability indices quantify how well a process fits within its specification limits. The family of indices (Cp, Cpk, Pp, Ppk) each answer a slightly different question. Understanding when to use which index — and the conditions that must be met — is heavily tested in the engineering practice.

Short-Term Capability Indices (Within σ)

Index	Formula	What it measures	Limitation
Cp	Cp = (USL − LSL) / (6·σ_within)	Potential capability — how wide the spec is relative to the process spread. Ignores centring.	A high Cp with a poorly centred process will still produce defects
CpL	CpL = (X̄ − LSL) / (3·σ_within)	Lower capability — distance from mean to lower spec in σ units	One-sided; use when only a lower limit matters
CpU	CpU = (USL − X̄) / (3·σ_within)	Upper capability — distance from mean to upper spec in σ units	One-sided; use when only an upper limit matters
Cpk	Cpk = min(CpL, CpU)	Actual short-term capability — accounts for both spread and centring. The most commonly used index.	If process is perfectly centred, Cpk = Cp
Cr	Cr = 1/Cp = 6σ/(USL−LSL)	Capability ratio — percentage of tolerance used by the process. Cr × 100 = % tolerance consumed.	Lower is better; Cr < 1.0 means Cp > 1.0

Long-Term Performance Indices (Overall σ)

Index	Formula	Key difference from Cp/Cpk
Pp	Pp = (USL − LSL) / (6·σ_overall)	Uses overall (total) standard deviation — includes all sources of variation over time (between-subgroup + within-subgroup)
PpL	PpL = (X̄ − LSL) / (3·σ_overall)	Long-term indices; Ppk ≤ Cpk always. The gap between Cpk and Ppk indicates how much the process mean has drifted or shifted over time.
PpU	PpU = (USL − X̄) / (3·σ_overall)
Ppk	Ppk = min(PpL, PpU)

Cp vs Cpk vs Pp vs Ppk — Summary

	Short-term (within σ)	Long-term (overall σ)
Potential (centring ignored)	Cp	Pp
Actual (centring included)	Cpk	Ppk

💡

If Cpk ≈ Ppk: the process is stable over time. If Cpk >> Ppk: the process has shifted or drifted — investigate between-subgroup variation.

Capability vs Rejection Rates

Cp / Cpk	Sigma level	Rejection rate
1.00	3σ	0.27% (2,700 ppm)
1.33	4σ	64 ppm
1.67	5σ	0.6 ppm
2.00	6σ	2 ppb

💡

Four conditions required: (1) sample represents population, (2) data is normally distributed, (3) process is in statistical control, (4) sample size is sufficient.

Short-Run SPC — Monitoring Low-Volume Production

A typical control chart needs 20–25 subgroups (≈100 data points) to establish reliable control limits. Short-run SPC solves the problem of low-volume or mixed-part production where insufficient data exists for traditional charts.

The Problem

When producing different-diameter items (e.g. 300mm, 400mm, 500mm) in small runs of 8 each, options are:

✗ 100% inspection — expensive
✗ First-off inspection only — misses process variation
✗ Last-off inspection — too late to react
✗ Separate chart per part — too little data per chart
✓ Short-run chart — plots all parts on one chart by transforming the data

Key Principle

Short-run SPC focuses on the process, not the product. By transforming raw measurements, parts with different nominal values can be plotted on a single chart — revealing process stability across multiple part numbers.

💡

Only valid if the different part runs have similar variance. If variance differs significantly between parts, a Z-MR chart (standardised) is needed instead.

Two Short-Run Chart Methods

Difference Chart (similar variance)

Subtract the nominal value for each run. Plot the deviations on a standard I-MR chart.

Difference = Actual − Nominal
Run A nominal = 300 → 302.6 − 300 = 2.6
Run B nominal = 500 → 504.2 − 500 = 4.2
Run C nominal = 400 → 400.5 − 400 = 0.5
→ Plot all differences on one I-MR chart

Z-MR Chart (different variance between runs)

Standardise each measurement using the run's own mean and standard deviation. The Z score is plotted — chart limits are always ±3 regardless of part.

Z = (Xi − X̄ᵣᵤₙ) / σᵣᵤₙ
UCL = +3, LCL = −3 always
CL = 0 always
→ All parts, all runs, one chart

Cpk vs Ppk — Real-World Scenario with Full Worked Example

Cpk and Ppk look similar on paper but measure fundamentally different things. Cpk measures what your process can do when it's running well. Ppk measures what it actually does over extended time — including every shift change, raw material lot, and seasonal temperature swing. The gap between them tells a story about process management, not just process performance.

The Fundamental Difference

Cpk — Short-Term / Within

What the process can do

Uses σ_Within — estimated from within-subgroup variation only. Strips out the noise from subgroup-to-subgroup shifts. Represents the process at its best, as if operating under one stable short-term condition.

Cpk = min[(USL−X̄)/(3·σW), (X̄−LSL)/(3·σW)]

≠

Ppk — Long-Term / Overall

What the process actually does

Uses σ_Overall — the plain sample standard deviation across all observations. Includes every source of variation: within-subgroup, between-subgroup, drift, shift, operator, raw material. The real-world performance index.

Ppk = min[(USL−X̄)/(3·σO), (X̄−LSL)/(3·σO)]

How the Two Sigmas Are Computed (from Minitab Technical Documentation)

σ_Within — Strips drift out

σ̂W = sp / c₄(d)
sp = √[Σ(xij−x̄i)² / Σ(ni−1)]
Pooled std dev across subgroups (default)

or: σ̂W = MR̄ / d₂(w)
Average moving range when n=1

σ_Overall — Includes everything

σ̂O = s / c₄(n)
s = √[ΣiΣj(xij−x̄)² / (n−1)]
Plain sample std dev, all data pooled

σ̂O ≥ σ̂W always
∴ Ppk ≤ Cpk always

Visual: Why Cpk > Ppk When the Process Drifts

📊 Short-term vs Long-term variation — how mean drift inflates σ_Overall

Each narrow blue curve is a subgroup's short-term behaviour — tight, capable, well within spec. But when all three shifts combine into the long-term picture (red dashed curve), the overall spread is much wider. This is why Ppk ≤ Cpk always. The gap is not measurement error — it's process management information.

Worked Example — Automotive Fuel Injector Flow Rate

The Scenario

A fuel injector flow rate must meet LSL = 195 cc/min, USL = 205 cc/min (tolerance = 10 cc/min). You run a production study: 25 subgroups of n=5 collected over 3 production shifts across 5 days. The process uses Rbar to estimate σ_Within.

Capability Study Results

Grand mean X̄̄

= 200.8 cc/min

Average range R̄

= 2.34 cc/min

d₂(n=5)

= 2.326

c₄(n=5)

= 0.9400

Overall s

= 1.62 cc/min

n total

= 125 observations

Step 1 — Compute Both Sigmas

      σ̂W = R̄ / d₂(5) = 2.34 / 2.326

         = 1.006 cc/min

      σ̂O = s / c₄(5) = 1.62 / 0.9400

         = 1.723 cc/min

Step 2 — Compute Cpk (short-term)

      CPU = (205 − 200.8) / (3 × 1.006)

          = 4.2 / 3.018 = 1.392

      CPL = (200.8 − 195) / (3 × 1.006)

          = 5.8 / 3.018 = 1.922

      Cpk = min(1.922, 1.392) = 1.39

Step 2 — Compute Ppk (long-term)

      PPU = (205 − 200.8) / (3 × 1.723)

          = 4.2 / 5.169 = 0.812

      PPL = (200.8 − 195) / (3 × 1.723)

          = 5.8 / 5.169 = 1.122

      Ppk = min(1.122, 0.812) = 0.81

📋 Reading the Results — What Cpk=1.39, Ppk=0.81 Actually Means

Cpk = 1.39 (✓ Good)
When this process runs stably within a shift, it is capable — the machine can hit spec consistently. The process potential meets the standard for ongoing production (Cpk ≥ 1.33).

Ppk = 0.81 (✗ Poor)
Over the 5-day study, the process is not capable. The large gap (Cpk − Ppk = 0.58) reveals significant between-shift or between-day variation — likely from warm-up drift, operator differences, or raw material lot variation.

💡

The engineering decision: Do not report only Cpk. A customer seeing Cpk = 1.39 would approve the PPAP. But Ppk = 0.81 tells the real story — this process will produce field defects at rates far above what Cpk predicts. The correct action is to investigate the source of between-shift variation, fix it, then re-run the study with both indices reporting ≥ 1.33.

Interpreting the Cpk–Ppk Gap

Cpk vs Ppk Pattern	What it Means	Typical Root Cause	Action
Cpk ≈ Ppk (gap < 0.1)	Process is stable over time	No significant between-subgroup drift. What you see in a short run is what you get long-term.	Report Ppk to customer. No additional investigation needed.
Cpk moderately > Ppk (gap 0.1–0.3)	Some long-term drift present	Gradual tool wear, ambient temperature, material lot variation. Process is capable but not perfectly controlled.	Investigate between-subgroup sources. Tighten control plan.
Cpk significantly > Ppk (gap > 0.3)	Serious stability problem	Shift changes, operator methods, machine warm-up, batch material variation. Multiple distinct process streams being reported as one.	Do not submit this PPAP. Conduct MSE (Multi-Stream Evaluation). Stratify data by suspected source.
Ppk > Cpk	Unusual — investigate	Within-subgroup variation is inflated (e.g. too much between-part variation sampled in one subgroup — irrational subgrouping).	Review subgrouping strategy. Rational subgroups should represent only short-term common-cause variation.

Confidence Intervals — Never Report a Point Estimate Alone

A Cpk of 1.33 computed from 30 samples has a very different meaning than the same value from 200 samples. Confidence intervals quantify this uncertainty. These formulas are from the Minitab capability analysis documentation.

Cp — χ² based CI

      Lower = Ĉp · √(χ²1−α/2,ν / ν)

      Upper = Ĉp · √(χ²α/2,ν / ν)

      ν = fn · k(n−1)

Cpk — Normal approximation CI

      Lower = Ĉpk − Zα/2√(1/9kn + Ĉpk²/2ν)

      Upper = Ĉpk + Zα/2√(1/9kn + Ĉpk²/2ν)

      k = subgroups, n = avg size

Pp — χ² based CI (overall)

      Lower = P̂p · √(χ²1−α/2, kn−1 / (kn−1))

      Upper = P̂p · √(χ²α/2, kn−1 / (kn−1))

Ppk — Normal approximation CI

      Lower = P̂pk − Zα/2√(1/9kn + P̂pk²/2(kn−1))

      Upper = P̂pk + Zα/2√(1/9kn + P̂pk²/2(kn−1))

📌

Applied to our example (Cpk = 1.39, k=25, n=5): The 95% CI for Cpk is approximately [1.15, 1.63]. This means we cannot be certain the true Cpk exceeds 1.33 — it might be as low as 1.15. This is why 125 observations is borderline for formal PPAP submission; aim for 175+ to tighten the CI.

σ Estimation Methods — Which Formula Does Your Software Use?

Cpk and Ppk use different sigma estimates, and within each, there are multiple methods depending on subgroup size and data structure. Understanding which formula applies to your situation prevents misinterpretation — especially when comparing indices across software platforms.

Overview — When Each Method Applies

Sigma Type	Method	When Used	Used for
σ_Within Short-term Used in Cp, Cpk	Pooled Std Dev	Subgroup size n > 1 (default)	Cp, Cpk, UCL/LCL on X̄ chart
	Rbar (Average Range)	Subgroup size n > 1, alternative method	X̄-R charts — traditional method
	Average Moving Range (MR̄)	Subgroup size n = 1 (default)	I-MR charts — individual measurements
σ_Overall Long-term Used in Pp, Ppk	Sample Std Dev	All scenarios	Pp, Ppk — always this formula

σ_Overall — The Long-Term Standard Deviation

Always the plain sample standard deviation across all observations, corrected by the c₄ unbiasing constant. This is the denominator for Pp and Ppk.

Formula (Minitab default)

    σ̂Overall = s / c₄(n)

    s = √[ ΣiΣj(xij − x̄)² / (n − 1) ]

    where n = total observations, x̄ = grand mean across all data

    c₄(n) → 1 as n → ∞ (correction negligible for n > 50)

⚠️

σ_Overall includes all sources of variation: within-subgroup + between-subgroup + drift + shift + any systematic effects. It is always ≥ σ_Within, which is why Ppk ≤ Cpk always.

σ_Within Method 1 — Pooled Standard Deviation (Default, n > 1)

The default method when subgroup size > 1. Pools variance across all subgroups, then applies the c₄ unbiasing constant. This is what Minitab and most SPC software use by default.

Pooled Standard Deviation Formula

    σ̂Within = sp / c₄(d)

    sp = √[ ΣiΣj(xij − x̄i)² / Σi(ni−1) ]

    d = Σ(ni−1) + 1    (degrees of freedom)

    When subgroup size is constant: sp = √(Σsi² / k), d = n − k + 1

σ_Within Method 2 — Rbar (Average Range, n > 1)

The traditional control chart method — divides the average range by the d₂ constant. Used on X̄-R charts. Equivalent to pooled std dev when subgroup size is constant, but less efficient for unequal subgroup sizes.

Rbar Formula (equal subgroup sizes)

    σ̂Within = R̄ / d₂(ni)

    R̄ = (R₁ + R₂ + ... + Rk) / k

    Unequal subgroup sizes: uses weighted formula fi = [d₂(ni)]² / [d₃(ni)]²

σ_Within Method 3 — Average Moving Range (Default, n = 1)

When individual measurements are collected (subgroup size = 1), within-subgroup variation is estimated from consecutive differences — the moving range. This is the I-MR chart approach.

Average Moving Range Formula (w=2, default)

    σ̂Within = MR̄ / d₂(w)

    MRi = |xi − xi−1|    (for w=2, consecutive pairs)

    MR̄ = (MR2 + MR3 + ... + MRn) / (n − w + 1)

    d₂(2) = 1.128 · Median MR variant: σ̂ = MR̃ / d₄(w)

σ_Within Method 4 — Sbar (Average of Subgroup Standard Deviations)

Used on X̄-s charts. More efficient than Rbar for large subgroup sizes (n > 10). Applies c₄ weighting per subgroup.

Sbar Formula (unequal subgroup sizes)

    σ̂Within = Σ[hi·si/c₄(ni)] / Σhi

    hi = [c₄(ni)]² / [1 − c₄(ni)²]

    When subgroup size is constant: σ̂ = s̄ / c₄(n), s̄ = Σsi/k

Unbiasing Constants — c₄ and d₂ Reference Table

These constants correct the bias in sigma estimates from small samples. c₄ is used with standard deviations; d₂ is used with ranges. Both approach 1 as sample size increases.

n (subgroup size)	c₄	Used in σ̂ = s/c₄
2	0.7979	Pooled σ, Sbar
3	0.8862
4	0.9213
5	0.9400	Most common
6	0.9515
8	0.9650
10	0.9727
25	0.9896
∞	1.0000	Bias negligible

n (subgroup size)	d₂	Used in σ̂ = R̄/d₂
2	1.128	MR chart (w=2)
3	1.693
4	2.059
5	2.326	Most common
6	2.534
7	2.704
8	2.847
9	2.970
10	3.078

Which Method Should You Use?

n = 1

Individual data

σ̂ = MR̄/d₂(2)

Use I-MR chart. Default w=2. Average moving range divided by 1.128.

n = 2–9

Small subgroups

σ̂ = sp/c₄(d)

Pooled std dev (default) or Rbar. Use X̄-R chart. Pooled is more efficient.

n ≥ 10

Large subgroups

σ̂ = s̄/c₄(n)

Sbar method. Use X̄-s chart. Range method loses efficiency at n > 9.

💡

Source: All sigma estimation formulas on this page are from the Minitab Technical Support Document — Capability Analysis (Normal) Formulas: Capability Statistics (Default) and the Minitab Assistant White Paper on Capability Analysis. The c₄ and d₂ constants follow Montgomery (2001), Introduction to Statistical Quality Control, Wiley. These are the industry-standard formulas used in all major SPC software.

Applied Statistics

Quantitative Methods & Statistics

Hypothesis testing, confidence intervals, regression, ANOVA, probability distributions, and time-series analysis — the statistical toolkit every quality engineer needs to turn data into defensible decisions.

Data Types, Collection & Descriptive Statistics

Data Classification

Category	Type	Characteristics	Examples
Qualitative Description-based	Nominal	Categories only — no order, no arithmetic. Central tendency: Mode only.	Colour (Red/Blue), Pass/Fail, Product type
Qualitative Description-based	Ordinal	Ordered categories — differences not meaningful. Central tendency: Mode, Median.	Good/Bad/Worst, 1–5 star rating, Likert scale
Quantitative Number-based	Interval	Ordered + equal intervals — no true zero. All central tendency measures valid.	Temperature °C, Calendar year, IQ score
Quantitative Number-based	Ratio	Ordered + equal intervals + true zero. All calculations valid.	Length, Mass, Volume, Time, Temperature K

Continuous vs Discrete

Continuous: Can take any value in a range. Measurements — length, height, time, temperature. More sensitive, fewer samples needed, but more expensive to collect.

Discrete: Countable, whole numbers only. Number of defects, number of students, yes/no outcomes.

NOIR Mnemonic

Nominal → Ordinal → Interval → Ratio. Each level adds a property: Order → Equal intervals → True zero. You can always use a higher-level statistic on lower-level data but not vice versa.

Data Collection Plan

Element	Content
Why collect?	Goal, objective, business question to answer
Operational Definition	Precise definition of what is being measured — avoids ambiguity between collectors
How much / how / where / when	Sample size, frequency, location, time windows
Type of data	NOIR scale — determines which statistics and charts are appropriate
Collection method	Manual (check sheet) or automatic (sensors, gages)
Past vs future data	Historical data may have biases; prospective data is preferred
Reliability	Is the measurement system capable? (MSA first)

Data Coding

Transforming data to simplify calculations:

• Add/Subtract: Mean shifts by the same amount. Standard deviation unchanged.
• Multiply/Divide: Both mean and SD scale by the same factor.
• Truncation: Remove repetitive prefix (e.g. 0.55x → subtract 550 and divide by 1000). Reverse transform to get original mean and SD.

Data Quality

• Imputation: Replacing missing data with substituted values (e.g. row mean). Missing data introduces bias.
• Benford's Law: In natural data sets, digit 1 appears as leading digit ~30% of the time; digit 9 <5%. Violations can indicate data fabrication or errors.
• Integrity risks: Bias, lack of knowledge, boredom, rounding, intentional falsification

Worked Example · One dataset · All major measures

Descriptive Statistics — From Raw Data to Meaning

Descriptive statistics are not just mean, median, and mode. A complete descriptive summary explains center, spread, position, frequency, and shape. The goal is to answer five questions: Where is the data centered? How much does it vary? Where do observations sit within the distribution? How often do values occur? And does the shape suggest skewness, heavy tails, or outliers?

Example context: Below is one raw dataset of 30 process measurements. We use the same numbers to explain central tendency, dispersion, position, frequency, and shape — exactly the way descriptive statistics are reported in tools like Excel and Minitab.

44.8, 45.1, 45.3, 45.9, 46.0, 46.2, 46.4, 46.8, 47.1, 47.4, 47.8, 48.0, 48.2, 48.6, 48.9, 49.1, 49.5, 49.9, 50.4, 50.8, 51.3, 51.9, 52.4, 52.8, 53.4, 54.1, 55.0, 56.4, 58.7, 63.2

Mean

50.05

Median

49.00

Std Dev

4.31

Skewness

1.19

Excess Kurtosis

1.27

One Graph — Full Descriptive Statistics Story

This single figure combines frequency, cumulative position, central tendency, quartiles, and tail behavior so users can visually connect the numbers to the shape of the distribution.

Frequency Cumulative % Mean vs Median Quartiles Percentiles Shape

The histogram shows frequency. The rising line shows cumulative position. Vertical markers show mean, median, quartiles, and the 90th percentile. The long right tail and the large high-end observation make the distribution slightly right-skewed.

Center

Mean 50.22 vs Median 49.70

Mean slightly above median suggests the right tail is pulling the average upward.

Spread

SD 4.40 · IQR 4.88

Standard deviation shows total variation; IQR focuses on the stable middle of the data.

Shape

Skew 1.23 · Kurtosis 1.54

Positive skew and elevated kurtosis indicate right-tail risk and a few unusually high values.

How to Read Descriptive Statistics

1) Central Tendency — Where is the data centered?

Central tendency describes the “typical” value. The mean uses all observations and shifts toward extreme values. The median is the middle observation and is more stable when data is skewed. The mode is the most frequent value; for continuous measurements it is often estimated by grouping or rounding. In this dataset, the mean (50.05) is slightly above the median (49.00), which hints at a right tail pulling the average upward.

2) Dispersion — How spread out is the data?

Dispersion measures consistency. The range is the full width from minimum to maximum (18.4). The variance (18.59) uses squared deviation, while the standard deviation (4.31) expresses spread in the original units. The IQR (5.40) focuses on the middle 50% of the data and is less sensitive to outliers. A process can have a good mean but still be poor if dispersion is too large.

3) Position — Where do values sit inside the distribution?

Position measures rank. Quartiles divide the data into four parts: Q1 = 46.88, median = Q2 = 49.00, Q3 = 52.27. Percentiles give the value below which a chosen percentage falls. Here the 10th percentile is 45.84 and the 90th percentile is 55.14. These are extremely useful for reporting tails, customer risk, and threshold-based performance.

4) Frequency — How often do values occur?

Frequency tells you how observations are distributed across intervals. The histogram is the main visual tool: tall bars mean many observations in that region, short bars mean few. In descriptive output this idea also appears as counts, relative frequency, and cumulative frequency. Frequency is what turns raw numbers into an interpretable distribution.

5) Shape — Is the distribution symmetric, skewed, or heavy-tailed?

Shape goes beyond average and spread. Skewness (1.19) measures asymmetry: positive skew means a longer right tail, negative skew means a longer left tail, and zero means near-symmetry. Kurtosis looks at tail heaviness and outlier-proneness. The excess kurtosis here is 1.27: values above zero indicate heavier tails than normal, values below zero indicate lighter tails. Shape matters because non-normal shape changes how you interpret means, control limits, and capability.

4) Shape — Skewness & Kurtosis

Skewness measures asymmetry: how far the distribution leans. A value of 0 means perfect symmetry. Positive values indicate a long right tail (mean > median), negative values a long left tail (mean < median). In quality engineering, right skewness often signals occasional high-value outliers — tool wear, burst events, occasional defects. Kurtosis measures tail weight. Excess kurtosis = 0 means the tails match a normal distribution. Positive excess kurtosis (leptokurtic) means more extreme values occur than expected — critical for capability analysis because DPMO estimates derived from Cp/Cpk assume normality. In this dataset, skewness = 1.19 and excess kurtosis = 1.27 — both moderate, indicating a slightly heavier right tail and more occasional high outliers than a pure normal would predict.

Skewness — Three Distribution Shapes Compared

Skewness tells you which direction the data has a longer tail, and where the mean sits relative to the median and mode. Rule of thumb: |skewness| < 0.5 = approximately symmetric; 0.5–1.0 = moderate skew; >1.0 = strong skew.

📊 Skewness — negative, zero, and positive compared

💡

Quality engineering rule: Always check both skewness and excess kurtosis before computing Cp/Cpk. If |skewness| > 1 or |excess kurtosis| > 2, consider non-normal capability analysis (Weibull, Johnson transformation, or percentile-based methods) instead of assuming normality.

Descriptive Statistics — Central Tendency

Measure	Definition	Formula / Method	Properties
Mean (x̄)	Arithmetic average	x̄ = Σx / n	Affected by extreme values (outliers). Used for ratio/interval data.
Mode	Most frequently occurring value	Count occurrences; highest count wins	Only average valid for nominal data. A dataset can have multiple modes (bimodal).
Median	Middle value when sorted ascending	Odd n: middle value. Even n: average of two middle values.	Not affected by outliers. Preferred for skewed distributions.
Percentile	Value below which P% of data falls	i = P·n/100. If i whole: avg(i, i+1). If not: round up to next.	Q1=25th, Q2=50th (Median), Q3=75th percentile

Descriptive Statistics — Variability

Range

R = Max − Min

Simplest measure of spread. Sensitive to outliers. Example: (6, 9, 10, 11, 11, 14) → R = 14−6 = 8

Interquartile Range (IQR)

IQR = Q3 − Q1

Range of middle 50% of data. Robust to outliers. Example: Q3=11, Q1=9 → IQR = 2. Used in box-and-whisker plots.

Standard Deviation

s² = Σ(xᵢ−x̄)² / (n−1)
s = √s²

Average squared deviation from mean (sample formula uses n−1 for unbiasedness). Example: data (98, 99, 100, 101, 102, 100) → s²=2, s=1.414

Graphical Methods for Depicting Data

Each chart below is rendered from real sample data. Understanding the shape, landmarks, and interpretation of each is essential for the engineering practice.

Chart 1

Histogram

What it shows: Frequency distribution — the shape, centre, and spread of continuous data. Values are grouped into bins; bar height = count in that bin. Bars touch (no gaps) because data is continuous.

Key features: Shape reveals distribution type — normal (bell), right-skewed, left-skewed, bimodal, or uniform. Overlay a normal curve to visually pre-check normality before running a Q-Q plot.

Engineering Use

First look at any dataset. Identify modality, skew, and outliers before any statistical test. Required in DMAIC Measure phase.

Chart 2

Box-and-Whisker Plot

What it shows: The five-number summary — Min, Q1, Median, Q3, Max — in a single compact visual. The box spans Q1 to Q3 (the Interquartile Range, IQR). The line inside the box is the Median. Whiskers extend to Min and Max within 1.5×IQR. Points beyond whiskers are outliers.

Key formula: IQR = Q3 − Q1. Outlier threshold = Q3 + 1.5×IQR (upper) or Q1 − 1.5×IQR (lower).

Engineering Use

Compare multiple distributions side by side. Instantly reveals skew, spread, and outliers. Use in MSA to compare operator variation.

Chart 3

Stem-and-Leaf Plot

What it shows: The full distribution of data while keeping every original value visible. Each data point is split: the stem = leading digit(s), the leaf = the last digit. Reading the leaves left-to-right on each row gives you a mini histogram rotated 90°.

Example data: 21, 24, 26, 28, 31, 33, 35, 37, 39, 41, 43, 46, 48, 52, 55, 58

Engineering Use

Best for small datasets (n < 50). Reveals shape, outliers, and gaps — and unlike a histogram, you can read back every original data value.

Stem-and-Leaf Plot (n = 16)

Stem
Leaves

2
1
4
6
8

3
1
3
5
7
9

4
1
3
6
8

5
2
5
8

Stem = tens digit  |  Leaf = units digit  |  e.g. 2 | 4 = 24

Chart 4

Normal Probability Plot (Q-Q Plot)

What it shows: Whether your data follows a normal distribution. Data quantiles are plotted against theoretical normal quantiles. If the data is normal, all points fall on or very close to the diagonal reference line.

Interpretation: Points hugging the line ✓ normal. S-curve = skewed. Banana curve = heavy tails. A single point far off-line = outlier. Use p-value > 0.05 (Anderson-Darling or Kolmogorov-Smirnov) to confirm at 95% confidence.

Engineering Use

Required before running capability analysis (Cp/Cpk). Non-normal data must be transformed or analyzed with non-parametric methods.

When to use which chart

Histogram

Shape & distribution of large datasets. First step in any analysis.

Box-and-Whisker

Compare multiple groups. Spot outliers and skew at a glance.

Stem-and-Leaf

Small datasets (n < 50). See every original value in context.

Q-Q Plot

Test normality before Cp/Cpk. Always run before capability studies.

Probability — Models, Rules & Distributions

Probability Models

Classic (A Priori) Model

P(A) = Outcomes in A / Total outcomes

Used when all outcomes are equally likely and can be counted theoretically. Example: P(rolling a 3) = 1/6. No experiment needed.

Relative Frequency (Empirical) Model

P(A) = Times A occurred / Total trials

Used when theoretical probability is unknown — estimate from observed data. Approaches true probability as n → ∞. Example: defect rate from production history.

Counting — Factorial, Permutations & Combinations

Concept	Formula	Order matters?	Example
Factorial	n! = n×(n−1)×…×1. 0!=1	—	5! = 5×4×3×2×1 = 120
Permutation	P(n,r) = n!/(n−r)!	Yes — order matters	Lock code 3376 — P(10,4) = 5040 arrangements
Combination	C(n,r) = n!/[r!(n−r)!]	No — order irrelevant	Select 2 from 5 students — C(5,2) = 10 groups

Key Probability Distributions — Summary Table

Distribution	Type	Key parameters	Conditions / when to use	Mean	Variance
Normal	Continuous	μ, σ	Symmetric, bell-shaped. Central Limit Theorem. 68/95/99.7 rule. Z = (X−μ)/σ	μ	σ²
t (Student's)	Continuous	df = n−1	Small samples (n<30) or unknown σ. Wider than normal; converges to normal as df→∞	0	df/(df−2)
Chi-square (χ²)	Continuous	df = n−1	Testing population variance; goodness of fit; independence in contingency tables. χ² = (n−1)s²/σ²	df	2·df
F	Continuous	df₁, df₂	Comparing two variances; ANOVA F-ratio = MS_between/MS_within. Always right-tailed.	df₂/(df₂−2)	—
Binomial	Discrete	n, p	Fixed n trials; 2 outcomes; constant p; independent. P(x) = C(n,x)·pˣ·(1−p)ⁿ⁻ˣ	np	np(1−p)
Bernoulli	Discrete	p	Binomial with n=1 (single trial). P(success) = p, P(failure) = 1−p	p	p(1−p)
Hypergeometric	Discrete	N, A, n	Sampling without replacement from finite population. Use instead of binomial when n > 5% of N. P(x) = C(A,x)·C(N−A,n−x)/C(N,n)	nA/N	—
Poisson	Discrete	μ	Rare events in fixed region. Mean = variance = μ. P(x;μ) = e⁻μ·μˣ/x!	μ	μ

Confidence Intervals — Complete Reference

A confidence interval provides a range within which the true population parameter is believed to lie with a stated probability (confidence level). The width is controlled by sample size, standard deviation, and confidence level.

CI for Mean — z-based (σ known or n ≥ 30)

CI = x̄ ± zα/2 · (σ / √n)

Confidence	α	z_α/2
90%	0.10	1.645
95%	0.05	1.96
99%	0.01	2.576

Worked Example

100 random residents, x̄ = $42,000, σ = $5,000. Find 95% CI.

CI = 42,000 ± 1.96 × (5,000/√100)
CI = 42,000 ± 1.96 × 500
CI = 42,000 ± 980
CI = $41,020 to $42,980

CI for Mean — t-based (σ unknown and n < 30)

CI = x̄ ± tα/2, n-1 · (s / √n)

Use t-distribution with (n−1) degrees of freedom. As n increases, t → z.

Worked Example

n=25, x̄=$42,000, s=$5,000. Find 95% CI. t_0.025,24 = 2.064

CI = 42,000 ± 2.064 × (5,000/√25)
CI = 42,000 ± 2,064
CI = $39,936 to $44,064

CI for Proportion

CI = p̂ ± zα/2 · √(p̂(1−p̂)/n)

Conditions: np ≥ 5 AND n(1−p) ≥ 5 (to approximate binomial with normal)

Worked Example

n=100, 10 defective (p̂=0.10). Find 95% CI.

np=10 ≥ 5 ✓  n(1−p)=90 ≥ 5 ✓
CI = 0.10 ± 1.96×√(0.10×0.90/100)
CI = 0.10 ± 0.06
CI = 0.04 to 0.16 (4% to 16%)

CI for Variance (Chi-square)

(n−1)s² / χ²α/2  ≤  σ²  ≤  (n−1)s² / χ²1-α/2

χ² is not symmetric — use two separate chi-square table values for the two tails.

Worked Example

n=25, s²=4. Find 90% CI for σ². χ²_0.05,24=36.42, χ²_0.95,24=13.848

Lower: (24×4)/36.42 = 2.64
Upper: (24×4)/13.848 = 6.93
90% CI for σ²: 2.64 to 6.93

Hypothesis Testing — 38 Tests, 6 Families

Every hypothesis test follows the same 6-step logic. What changes is the test statistic and its distribution. Master the framework once — apply it to all 38 tests.

The 6 Families — Complete Decision Tree

Universal 6-Step Framework — Every Test Uses This

①

State H₀

Null: no effect, no diff, status quo

②

State H₁

Alternative: what you want to prove

③

Set α

Usually 0.05. Decide before seeing data.

④

Compute

Calculate test statistic from your data

⑤

p-value

P(data this extreme | H₀ true). p<α → reject H₀

⑥

Conclude

Engineering meaning, not just reject/fail

Family ① — 9 Tests

Parametric Means Tests

Use when your response is continuous and approximately normally distributed (or n ≥ 30, by the Central Limit Theorem). You are comparing one or more means. If normality is badly violated with small n, switch to Family ⑤ non-parametric alternatives.

1 · One-Sample z-Test

σ known · n ≥ 30

When to Use This Test

✓ You have one sample and want to test whether its mean equals a known target μ₀

✓ Population std dev σ is known from engineering specs or prior studies

✓ n ≥ 30 — CLT ensures sampling distribution is approximately normal even if data isn't

✗ σ unknown and n < 30 — use the one-sample t-test instead

The Formula

z = (x̄ − μ₀) / (σ / √n)

Symbol	Meaning	In Practice
x̄	Sample mean	Average of your n measurements
μ₀	Hypothesised population mean	The target or specification value you are testing against
σ	Population standard deviation	Known from process history, engineering spec, or prior studies
n	Sample size	Number of observations in your sample
σ/√n	Standard error of the mean	How much x̄ varies from sample to sample — shrinks as n grows

Decision rule: Two-tail: reject H₀ if |z| > z_α/2 · Upper-tail: reject if z > z_α · Lower-tail: reject if z < −z_α
z_0.025 = 1.960 (two-tail 95%) z_0.05 = 1.645 (one-tail 95%)

Engineering Example — CNC Bolt Diameter

Scenario: A CNC machine produces bolts with a specified diameter of μ₀ = 10.000 mm. From historical process data, σ = 0.050 mm is known. A quality engineer samples n = 64 bolts and measures x̄ = 10.012 mm. Has the machine drifted off-centre? Use α = 0.05, two-tail.

Step-by-Step Solution

① State Hypotheses

H₀: μ = 10.000 mm
H₁: μ ≠ 10.000 mm (two-tail)

② Calculate Standard Error

SE = σ/√n = 0.050/√64 = 0.050/8
SE = 0.00625 mm

③ Compute Test Statistic

z = (10.012 − 10.000) / 0.00625
z = 0.012 / 0.00625 = 1.92

④ Find Critical Value

zcrit = ±1.960 (α=0.05, two-tail)
p-value ≈ 0.055

⑤ Decision

|1.92| < 1.960 → Fail to reject H₀
p = 0.055 > α = 0.05

Rejection Region Diagram

Engineering Conclusion: No statistical evidence the machine has drifted at the 5% level. However p = 0.055 is borderline — increase sample size to n = 100 to detect a 0.012 mm shift more reliably, or investigate if the drift direction is consistently positive.

2 · One-Sample t-Test

σ unknown · any n

When to Use This Test

✓ One sample — comparing mean to a known target μ₀

✓ σ is unknown — you estimate it from your sample as s

✓ Works for any sample size — even n = 5 or n = 10

📌 The most common single-sample test in practice — default when σ is unknown

The Formula

t = (x̄ − μ₀) / (s / √n)

degrees of freedom  df = n − 1

Symbol	Meaning	In Practice
x̄	Sample mean	Average of your measurements
μ₀	Hypothesised mean	The target or spec value you are testing against
s	Sample standard deviation	Estimated from data: s = √[Σ(xᵢ−x̄)²/(n−1)]
s/√n	Standard error of the mean	Uncertainty in x̄ due to finite sample size
df	Degrees of freedom	n−1. Determines which t-distribution to use. t → z as df → ∞

Key difference from z-test: The t-distribution has heavier tails than the normal distribution, making it harder to reject H₀ with small samples — correctly accounting for the extra uncertainty from estimating σ with s. As n increases, t → z.

Engineering Example — Fill Weight Verification

Scenario: A packaging line targets μ₀ = 500 g fill weight. An engineer collects a sample of n = 9 packs and measures: x̄ = 497 g, s = 6 g. Is the machine under-filling? Use α = 0.05, lower one-tail (we only care if it's too low).

① Hypotheses

H₀: μ ≥ 500g    H₁: μ < 500g

② Standard Error

SE = s/√n = 6/√9 = 6/3 = 2.0 g

③ Test Statistic

t = (497 − 500) / 2.0 = −3/2 = −1.50

④ Critical Value (df = 8)

tcrit = −1.860 (lower-tail, α=0.05, df=8)

⑤ Decision

−1.50 > −1.860 → Fail to reject H₀
p ≈ 0.086 > 0.05

Conclusion: No statistical evidence of under-filling at 5% level. p = 0.086 is noteworthy though — with n = 9 this test has low power. Increase to n = 25 to detect a 3g shift reliably.

t vs z — Why Tails Are Heavier

3 · Two-Sample z-Test

2 groups · σ₁ σ₂ known · n₁ n₂ ≥ 30

When to Use This Test

✓ Comparing means of two independent groups

✓ Both σ₁ and σ₂ are known, OR both n₁ ≥ 30 and n₂ ≥ 30

✓ Samples are drawn independently from two populations

✗ σ unknown or small n — use independent t-test (Tests 4 or 5)

The Formula

z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂)

Symbol	Meaning
x̄₁ − x̄₂	Observed difference between the two sample means
σ₁², σ₂²	Known population variances for groups 1 and 2
√(σ₁²/n₁+σ₂²/n₂)	Standard error of the difference — how much x̄₁−x̄₂ varies by chance

Engineering Example — Comparing Two Production Plants

Scenario: Plant A and Plant B both produce aluminium castings. Historical σ values are known from long-running process control. Does tensile strength differ between plants? α = 0.05, two-tail.

                      Plant A: n₁=40, x̄₁=52.1 MPa, σ₁=2.0

                      Plant B: n₂=35, x̄₂=50.8 MPa, σ₂=2.2

                      SE = √(4.0/40 + 4.84/35)

                      SE = √(0.100 + 0.138) = √0.238 = 0.488

                      z = (52.1−50.8) / 0.488 = 1.3/0.488 = 2.66

z_crit = ±1.960 (two-tail, α=0.05) · 2.66 > 1.960 → Reject H₀ · p ≈ 0.008
Conclusion: Plants A and B produce significantly different tensile strengths. Plant A averages 1.3 MPa higher — investigate process differences.

4 & 5 · Independent t-Test — Pooled & Welch's

2 groups · σ unknown

How to choose: First run an F-test or Levene's test for equal variances. If variances are equal → Pooled t (Test 4). If variances are unequal, or you are unsure → Welch's t (Test 5). When in doubt, Welch's is the safer default — it is slightly conservative but never wrong.

Test 4 — Pooled t (Equal Variances)

t = (x̄₁ − x̄₂) / [Sp × √(1/n₁ + 1/n₂)]

Sp² = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁+n₂−2)

df = n₁ + n₂ − 2

Symbol	Meaning
Sp	Pooled standard deviation — weighted average of s₁ and s₂
Sp²	Pooled variance — borrows strength from both samples
df	n₁+n₂−2 — more df means narrower t-distribution, easier to reject

Test 5 — Welch's t (Unequal Variances)

t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)

df = (s₁²/n₁ + s₂²/n₂)² / [(s₁²/n₁)²/(n₁−1) + (s₂²/n₂)²/(n₂−1)]

Symbol	Meaning
s₁², s₂²	Individual sample variances — not pooled
df (Welch)	Welch-Satterthwaite equation — df is non-integer, usually lower than pooled df

Engineering Example — Two Welding Processes

Scenario: Two MIG welding processes are compared for joint strength (kN). F-test confirms equal variances. Are the mean strengths different? α = 0.05, two-tail.

Data

Process A: n=10, x̄=52.3, s=2.1
Process B: n=12, x̄=50.1, s=2.3

Pooled Variance

Sp² = (9×4.41 + 11×5.29) / 20
= (39.69 + 58.19) / 20 = 4.894
Sp = √4.894 = 2.212

Test Statistic

SE = 2.212 × √(1/10 + 1/12) = 2.212 × 0.4282 = 0.947
t = (52.3 − 50.1) / 0.947 = 2.32
df = 10+12−2 = 20

Decision

tcrit(20df, α=0.05) = ±2.086
2.32 > 2.086 → Reject H₀

Conclusion: Process A produces significantly stronger joints than Process B (mean difference = 2.2 kN, p ≈ 0.031). The pooled t-test was appropriate because the F-test confirmed equal variances (F = 1.20 < F_crit = 3.07).

If variances were unequal: Switch to Welch's t. Welch's df would be approximately 19.7 (non-integer) — slightly fewer df, slightly wider critical region, but still p < 0.05 in this case.

6 · Paired t-Test

Same subjects · Before / After

When to Use This Test

✓ The same units / subjects are measured twice — before and after a treatment

✓ Matched pairs — two sensors measuring same part, left vs right side, twin studies

⚡ More powerful than 2-sample t — eliminates between-subject variability by focusing only on within-subject change

✗ NOT for independent groups — pairing where none exists inflates Type I error

The Formula — Reduce to a One-Sample t on the Differences

Step 1: compute differences   dᵢ = Y₁ᵢ − Y₂ᵢ

t = d̄ / (s_d / √n)

df = n − 1    H₀: μ_d = 0

Symbol	Meaning	How to Calculate
dᵢ	Individual differences	dᵢ = Y₁ᵢ − Y₂ᵢ for each pair i
d̄	Mean of the differences	d̄ = Σdᵢ / n
s_d	Standard deviation of differences	s_d = √[Σ(dᵢ−d̄)² / (n−1)]
s_d/√n	Standard error of the mean difference	How precisely d̄ estimates μ_d

Engineering Example — Vibration Damper Before / After

Scenario: A new vibration damper is fitted to 8 identical machine tools. Vibration amplitude (mm/s) is measured before and after on the same machine. Did the damper reduce vibration? α = 0.05, lower one-tail (we want to show it decreased).

Machine	Before (Y₁)	After (Y₂)	d = Y₁−Y₂	(d − d̄)²
1	8.4	6.9	+1.5	0.0196
2	7.1	5.8	+1.3	0.1156
3	9.2	7.4	+1.8	0.0676
4	6.5	5.2	+1.3	0.1156
5	8.8	7.1	+1.7	0.0196
6	7.6	6.0	+1.6	0.0036
7	9.0	7.5	+1.5	0.0196
8	8.1	6.3	+1.8	0.0676
Sum / Mean	—	—	d̄ = 1.5625	Σ = 0.4088

Standard Deviation of Differences

s_d = √(0.4088 / 7) = √0.05840 = 0.2417

Test Statistic

t = 1.5625 / (0.2417/√8)
t = 1.5625 / 0.08545 = 18.28

Critical Value (df = 7)

tcrit(7df, lower-tail, α=0.05) = +1.895
(we reject if t > +1.895 since d = before−after)

Decision & Conclusion

18.28 ≫ 1.895 → Reject H₀
Damper significantly reduces vibration
Mean reduction: 1.56 mm/s (18.6%)

Before vs After — All Differences Positive

7 · One-Way ANOVA — Comparing 3 or More Means

k ≥ 3 groups · F-statistic

When to Use This Test

✓ Comparing means of 3 or more independent groups simultaneously

✓ One categorical factor (treatment) with k levels

⚡ Running multiple t-tests instead inflates α — k=4 groups requires 6 pairwise tests → family error = 1−0.95⁶ = 26%

📌 A significant F only tells you at least one pair differs — follow with Tukey HSD to find which pairs (Family ②)

The Formula — Decomposing Total Variation

F = MS_Between / MS_Within

MS_Between = SS_B / (k−1)     MS_Within = SS_W / (N−k)

Symbol	Meaning	Formula
SS_Between	Variation due to the factor (between groups)	Σ nᵢ (x̄ᵢ − x̄)²
SS_Within	Variation within groups (random error)	Σ Σ (xᵢⱼ − x̄ᵢ)²
MS_Between	Mean square between — treatment effect estimate	SS_B / (k−1)
MS_Within	Mean square within — pure noise estimate	SS_W / (N−k)
F	Ratio of treatment signal to noise. If H₀ true, F ≈ 1. Large F → groups differ.	MS_B / MS_W

Engineering Example — 3 Adhesive Curing Temperatures

Scenario: An adhesive bond strength (MPa) is measured at 3 curing temperatures. 4 specimens per group. Does curing temperature significantly affect bond strength? α = 0.05.

120°C (A)	150°C (B)	180°C (C)
12.1	15.3	10.8
11.8	16.1	11.2
12.5	14.8	10.5
12.2	15.6	10.9
x̄=12.15	x̄=15.45	x̄=10.85

Grand Mean

x̄ = (12.15+15.45+10.85)/3 = 12.817

SS_Between

SS_B = 4[(12.15−12.817)² + (15.45−12.817)² + (10.85−12.817)²]
= 4[0.445 + 6.928 + 3.869] = 44.57

SS_Within

SS_W = Σ(within-group deviations²)
= 0.41 + 1.28 + 0.36 = 2.05

ANOVA Table

Source	SS	df	MS	F
Between	44.57	2	22.28	97.7
Within	2.05	9	0.228	—
Total	46.62	11	—	—

Decision

Fcrit(2,9,α=0.05) = 4.26
97.7 ≫ 4.26 → Reject H₀
p < 0.001

Conclusion: Curing temperature significantly affects bond strength. 150°C produces the highest mean (15.45 MPa). Now run Tukey HSD (Family ②) to confirm all three pairs are significantly different.

8 · Two-Way ANOVA

Two factors tested simultaneously

F_A = MS_A / MS_Error
F_B = MS_B / MS_Error
F_AB = MS_AB / MS_Error

✓ 2 categorical factors (A and B)
✓ Tests main effects A, B, and interaction AB
✓ More efficient than two separate one-way ANOVAs
⚡ Detects synergy/interference between factors

Example: Machine type (A: 3 models) × Operator shift (B: Day/Night) → Tensile strength. Two-Way ANOVA reveals whether a specific machine performs better on a specific shift — an AB interaction.

9 · Repeated Measures ANOVA

Same subjects, 3+ time points

SS_Within = SS_Treatment + SS_Error
F = MS_Treatment / MS_Error
Check sphericity with Mauchly's test

✓ Same subjects measured at k ≥ 3 time points
✓ Removes between-subject variation → more power
✓ Assumes sphericity (equal variance of differences)
📌 Non-parametric alternative: Friedman Test (Family ⑤)

Example: 10 operators measured at 4 time points during a shift. RM-ANOVA tests whether fatigue causes a significant and consistent change in accuracy over time across all operators.

Family ② — 5 Tests

Post-Hoc Tests

Run ONLY after a significant ANOVA F-test. ANOVA tells you at least one pair differs — post-hoc tests identify which pairs. Running them without a significant F first inflates Type I error and produces false positives.

Why You Cannot Just Run Multiple t-Tests — The α Inflation Problem

1 · Tukey's HSD — Default Choice for All Pairwise Comparisons

balanced design · equal n

When to Use

✓ After a significant one-way ANOVA F-test

✓ All pairwise comparisons needed simultaneously

✓ Equal group sizes (balanced design)

📌 Best power-to-α-control balance — the default post-hoc choice in most engineering settings

The Formula

HSD = qα,k,df_W × √(MS_W / n)

Reject H₀ for pair (i,j) if |x̄ᵢ − x̄ⱼ| > HSD

Symbol	Meaning	Detail
q_{α,k,df_W}	Studentised range critical value	From q-table: depends on α, k (number of groups), df_W (within-group df)
MS_W	Mean square within (from ANOVA table)	Pooled error estimate — same value used in the ANOVA F-test
n	Group size (equal across groups)	Number of observations per treatment group
HSD	Honestly Significant Difference	The minimum difference required between two means to declare significance

Engineering Example — 3 Curing Temperatures for Adhesive

Scenario: One-Way ANOVA found F = 97.7, p < 0.001 (significant). Three curing temperatures: A=120°C (x̄=12.15 MPa), B=150°C (x̄=15.45 MPa), C=180°C (x̄=10.85 MPa). k=3 groups, n=4 per group, MS_W=0.228, df_W=9. Which pairs are significantly different?

① Find q critical value

q(α=0.05, k=3, df_W=9) = 3.948
(from Studentised range table)

② Calculate HSD

HSD = 3.948 × √(0.228/4)
= 3.948 × 0.2387 = 0.942 MPa

③ Compare all pairs vs HSD

Pair	\|Diff\|	Sig?
B vs C	4.60	✓ YES
B vs A	3.30	✓ YES
A vs C	1.30	✓ YES

Conclusion: All three temperatures produce significantly different bond strengths. Optimal: 150°C (B) at 15.45 MPa.

Mean Comparison Plot — HSD Intervals

2 · Bonferroni Correction

α* = α/m

✓ Pre-planned comparisons (you decided before data)

✓ Small m (≤5 comparisons) — more power than Tukey for few tests

✗ Large m (≥10) — becomes too conservative, misses real differences

📌 Universal — works for any type of test, not just ANOVA

α* = α / m

Use α* as the significance level for each individual test

Symbol	Meaning
α	Desired family-wise error rate (usually 0.05)
m	Total number of comparisons being made
α*	Adjusted threshold — use this for each individual t-test

Example — Same 3-Process ANOVA

                    m = 3 pairs, α = 0.05

                    α* = 0.05/3 = 0.0167

                    tcrit(df=9, α*=0.0167) ≈ 2.933

                    Pair B vs A: t = (15.45−12.15)/√(2×0.228/4)

                    = 3.30/0.338 = 9.76 > 2.933 → Significant

3 · Scheffé Test

complex contrasts

✓ Any linear contrast, not just pairwise comparisons

✓ Unplanned comparisons after viewing data

✗ Lowest power for simple pairs — Tukey is better for pairwise

📌 Most conservative — safest for data-dredging protection

F* = (k−1) × Fα, k-1, N-k

Critical value for any contrast — more demanding than Tukey

Symbol	Meaning
k	Number of groups in the ANOVA
F_α,k-1,N-k	Critical F from the original ANOVA test
F*	Scheffé critical value for any contrast

Example — Complex Contrast: B vs Average of A & C

                    F* = (3−1) × F0.05,2,9 = 2 × 4.26 = 8.52

                    Contrast L = x̄_B − (x̄_A+x̄_C)/2

                    = 15.45 − (12.15+10.85)/2 = 3.90 MPa

                    F_contrast = L²/(MS_W × Σcᵢ²/nᵢ)

                    = 3.90²/(0.228 × 1.5) = 44.5

                    44.5 > 8.52 → Significant

4 · Newman-Keuls (SNK) Step-Down Test

step-down · higher power

When to Use This Test

✓ After significant ANOVA — all pairwise comparisons needed

✓ Higher power than Tukey when k is large (4+ groups)

✗ Family-wise α not fully controlled — some Type I inflation possible

✗ Not recommended for confirmatory regulatory submissions — use Tukey

The Formula

q_p = (x̄_max − x̄_min) / SE

SE = √(MS_W / n)    p = span (number of means in range, from 2 to k)    df = N−k

Symbol	Meaning	Detail
q_p	Studentised range statistic for a span of p means	Critical value CHANGES with p — wider spans use larger q_p
p	Number of means in the range being compared	p=2: adjacent pair · p=k: full range. Wider span = larger critical value.
SE	Standard error of a group mean	SE = √(MS_W/n) — same as in Tukey
Step-down	Procedure order	Compare largest span first (p=k). If not significant, stop. Proceed inward only if larger span is significant.

Engineering Example

Scenario: Four coating processes (k=4): A=10.8, B=12.1, C=15.4, D=11.3 MPa. n=4 per group, MS_W=0.228, df_W=12. Run SNK after significant ANOVA.

① Rank means lowest → highest

A=10.8 < D=11.3 < B=12.1 < C=15.4

② SE and step-down q values

SE = √(0.228/4) = 0.2387
q(p=4,df=12,0.05)=4.199 → HSD=1.002
q(p=3,df=12,0.05)=3.773 → HSD=0.900
q(p=2,df=12,0.05)=3.082 → HSD=0.736

③ Start widest span (p=4): A vs C

|15.4−10.8| = 4.60 > 1.002 → Sig ✓

Advantage over Tukey

Adjacent pairs use smaller q (3.082 vs 4.199) — more power to detect close means. Trade-off: slight α inflation for distant pairs.

Key insight — why step-down works: The procedure uses progressively smaller critical values as spans narrow. For adjacent means (p=2), q=3.082 gives a much tighter threshold than Tukey's fixed q=4.199. This is why SNK detects differences that Tukey misses — but at the cost of slightly elevated Type I error for pairs that span many means.

When to use vs Tukey: Exploratory manufacturing studies where maximising detection matters more than strict family-wise α control. For process improvement decisions (not regulatory submissions).

5 · Duncan's Multiple Range Test

liberal · exploratory only

When to Use This Test

✓ Highest statistical power — detects the smallest real differences between means

✓ Exploratory research where missing a real effect is the greater concern

✗ Weakest family-wise α control — highest false positive rate of all 5 post-hoc tests

✗ Not appropriate for confirmatory engineering or regulatory studies — use Tukey

The Formula

α_p = 1 − (1−α)^(p−1)

Protection level varies with range p. Smallest ranges use nominal α; larger ranges allow higher error.

Symbol	Meaning	Detail
p	Number of means in the comparison range	p=2: both means adjacent in ranking · p=k: entire range
α_p	Effective significance level for span of p means	α_p increases with p — least conservative for wide spans
R_p	Critical range for span p	R_p = q_p(α_p, df) × SE — smaller than Tukey at each step

Engineering Example

Scenario: Same 4-process data. Duncan's uses α_p = 1−(1−0.05)^(p−1) at each step, giving critical ranges that are tighter than both Tukey and SNK. Shows how liberal the test is.

          For p=2: α_2=1−0.95^1=0.050 → q_2=3.082

          For p=3: α_3=1−0.95^2=0.098 → q_3=2.779

          For p=4: α_4=1−0.95^3=0.143 → q_4=2.663

          Duncan R_p (SE=0.2387):

          R_2=0.736  R_3=0.663  R_4=0.635

          Compare to Tukey HSD=1.002 for all spans.

          Duncan flags more pairs as significant.

Critical warning: Duncan's test is the most liberal post-hoc test available. It does NOT control family-wise error rate in the traditional sense. With k=10 groups, the effective α for distant comparisons can exceed 40%. Use only in exploratory biological or agricultural research where power is paramount.

Test	α control	Power
Tukey	Exact ✓	High
Bonferroni	Conservative ✓	Moderate
Newman-Keuls	Partial ⚠	Higher
Duncan	Weakest ✗	Highest

Family ③ — 7 Tests

Proportions & Counts Tests

Use when your data is categorical — pass/fail, defect type, yes/no, attribute data. You are counting frequencies or testing proportions, not measuring a continuous response. The test statistic follows a z or χ² distribution.

1 · One-Proportion z-Test

binary outcome · np₀≥5

When to Use

✓ Binary outcome (defective/good, pass/fail, yes/no)

✓ Testing if a proportion equals a known standard p₀

✓ Both n×p₀ ≥ 5 AND n×(1−p₀) ≥ 5 (sample large enough)

✗ Small n where np₀ < 5 — use Fisher's Exact Test instead

The Formula

z = (p̂ − p₀) / √(p₀(1−p₀)/n)

Symbol	Meaning	Detail
p̂	Sample proportion	p̂ = x/n where x = number of successes in sample of n
p₀	Hypothesised proportion	The known or target proportion under H₀ (e.g. historical defect rate)
√(p₀(1−p₀)/n)	Standard error of p̂	How much the sample proportion varies by chance around p₀
z	Standardised test statistic	Compared to z_α=1.645 (one-tail) or z_α/2=1.960 (two-tail)

Engineering Example — New Supplier Defect Rate

Scenario: A component supplier has a historical defect rate of p₀ = 0.04 (4%). A new production batch of n = 250 parts is received. 14 defects are found. Has the defect rate increased significantly? α = 0.05, upper one-tail test (we only care if it's higher).

① Hypotheses

H₀: p ≤ 0.04    H₁: p > 0.04

② Sample Proportion

p̂ = 14/250 = 0.056

③ Standard Error

SE = √(0.04×0.96/250)
= √(0.0001536) = 0.01239

④ Test Statistic

z = (0.056−0.04)/0.01239
z = 0.016/0.01239 = 1.29

⑤ Decision

zcrit = 1.645 (upper, α=0.05)
1.29 < 1.645 → Fail to reject H₀
p-value = 0.098

Conclusion: No statistical evidence the defect rate has increased (p=0.098). But borderline — monitor next batch. Increase n to 500 to detect a 4%→5.6% shift reliably.

Upper-Tail Rejection Region

2 · Two-Proportion z-Test

2 independent groups · pooled p

When to Use

✓ Comparing defect rates, pass rates, or proportions from two independent processes/lines/suppliers

✓ All four counts: n₁p̂₁, n₁(1−p̂₁), n₂p̂₂, n₂(1−p̂₂) all ≥ 5

✗ Paired samples (same parts inspected twice) — use McNemar's instead

📌 Use pooled proportion p̄ under H₀: p₁=p₂ — assumes equality under null

The Formula

z = (p̂₁ − p̂₂) / √[ p̄(1−p̄)(1/n₁ + 1/n₂) ]

p̄ = (x₁ + x₂) / (n₁ + n₂)   ← pooled proportion

Symbol	Meaning	Detail
p̂₁, p̂₂	Sample proportions	Defect rates, pass rates, etc. from each group
p̄	Pooled proportion	Combined proportion assuming H₀: p₁=p₂ is true. Best estimate of the common p.
√[p̄(1−p̄)(1/n₁+1/n₂)]	Pooled standard error	Uncertainty in the difference p̂₁−p̂₂ under H₀

Engineering Example — Comparing Two Assembly Lines

Scenario: Line 1: n₁=200, 18 defects → p̂₁=0.090. Line 2: n₂=180, 9 defects → p̂₂=0.050. Are the defect rates significantly different? α=0.05, two-tail.

① Pooled Proportion

p̄ = (18+9)/(200+180) = 27/380 = 0.0711

② Pooled SE

SE = √(0.0711×0.9289×(1/200+1/180))
= √(0.06603×0.01056) = 0.02641

③ Test Statistic

z = (0.090−0.050)/0.02641 = 1.515

④ Decision

zcrit=±1.960 (two-tail)  p≈0.130
1.515<1.960 → Fail to reject H₀

Conclusion: No significant difference at α=0.05. The 4% gap (9%−5%) is not statistically significant with these sample sizes. Need n≈700 per line to detect reliably.

Proportion Comparison Visual

3 · χ² Goodness of Fit

df = k−1

✓ 1 categorical variable, k categories

✓ Does observed distribution match expected?

✓ All expected counts Eᵢ ≥ 5

📌 Tests uniformity, historical match, or theoretical fit

χ² = Σ (Oᵢ − Eᵢ)² / Eᵢ

df = k − 1    Eᵢ = n × pᵢ (expected under H₀)

Symbol	Meaning
Oᵢ	Observed count in category i
Eᵢ	Expected count under H₀: Eᵢ = n × p₀ᵢ
(O−E)²/E	Squared standardised deviation — large when observed departs from expected

Example — Defect Distribution by Day of Week

Does defect frequency depend on weekday? n=250 defects across 5 days. Expected: 50/day (uniform). Observed: Mon=62, Tue=48, Wed=44, Thu=51, Fri=45.

                    χ²=(62−50)²/50+(48−50)²/50+(44−50)²/50+(51−50)²/50+(45−50)²/50

                    =2.880+0.080+0.720+0.020+0.500 = 4.20

                    χ²crit(df=4, α=0.05) = 9.488

                    4.20 < 9.488 → Fail to reject H₀

                    No evidence defects depend on weekday

4 · χ² Test of Independence

df = (r−1)(c−1)

✓ Two categorical variables in a contingency table

✓ Are the two variables independent of each other?

✓ All expected cell frequencies ≥ 5

📌 Eᵢⱼ = (Row_i Total × Col_j Total) / Grand Total

χ² = Σᵢⱼ (Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ

Eᵢⱼ = (Row_i total × Col_j total) / n

Example — Defect Type vs Production Shift

Shift	Scratch	Dent	Crack	Total
Day	18	12	5	35
Night	22	8	15	45
Total	40	20	20	80

                    EDay,Crack=35×20/80=8.75  ENight,Crack=11.25

                    χ²=(5−8.75)²/8.75+(15−11.25)²/11.25+…

                    χ² = 6.095  df=(2−1)(3−1)=2

                    χ²crit(2df, α=0.05) = 5.991

                    6.095 > 5.991 → Reject H₀

                    Defect type IS associated with shift

3 · Bartlett's Test — Most Powerful When Normal

2+ groups · confirmed normal · χ² statistic

When to Use This Test

✓ 2 or more groups with confirmed normal distributions

✓ Maximum power when normality holds — more sensitive than Levene's for truly normal data

✗ Non-normal data — Bartlett's breaks down badly; use Levene's or Brown-Forsythe

✗ Outliers present — highly sensitive; one outlier can create a false positive

The Formula

χ² = [(N−k) ln(Sp²) − Σ(nᵢ−1) ln(sᵢ²)] / c

c = 1 + [Σ(1/(nᵢ−1)) − 1/(N−k)] / [3(k−1)]    Sp² = Σ(nᵢ−1)sᵢ² / (N−k)    df = k−1

Symbol	Meaning	Detail
Sp²	Pooled within-group variance	Weighted average of all group variances — the common variance under H₀
c	Bartlett correction factor	Adjusts for unequal group sizes. c=1 for equal n, slightly >1 for unequal n.
ln(sᵢ²)	Natural log of each group variance	Bartlett uses log-variance to create a chi-square statistic
χ²	Test statistic	Large χ² → group variances spread widely from Sp² → reject equal variances

Engineering Example

Scenario: Five production batches of a polymer are tested for tensile strength. Shapiro-Wilk confirms normality in all batches. Test if batch variances are equal before ANOVA. n=8 per batch. α=0.05.

① Compute individual variances

s₁²=1.82, s₂²=2.14, s₃²=1.91, s₄²=2.05, s₅²=1.88

② Pooled variance Sp²

Sp² = Σ(7×sᵢ²)/35 = 7(1.82+2.14+1.91+2.05+1.88)/35
= 7×9.80/35 = 1.960

③ Test Statistic

c = 1+[5×(1/7)−1/35]/(3×4) = 1.065
χ²=[35×ln(1.96)−7Σln(sᵢ²)]/1.065
= [35×0.673−7×3.286]/1.065 = 1.42

④ Decision

χ²crit(4df, α=0.05) = 9.488
1.42 < 9.488 → Fail to reject H₀
Batch variances are equal — pooled ANOVA valid

Why log-variance? The natural log of a chi-square distributed variable is approximately normal. Taking ln(sᵢ²) linearises the relationship and allows construction of a chi-square test statistic through the difference between pooled and individual log-variances.

The correction factor c: Without c, the statistic is biased for small samples. The correction brings it closer to the theoretical chi-square distribution. For equal group sizes, c simplifies considerably.

Critical warning: If any batch had a non-normal distribution (say, contaminated data creating a bimodal shape), Bartlett's would flag this as unequal variances even if the underlying processes had identical spread. Always confirm normality first.

Family ④ — 5 Tests

Variance Tests

Use when you need to test spread, not location. Required before independent t-tests (equal variance assumption), before ANOVA (homogeneity of variance), when comparing measurement system precision, or when a spec limit exists on process variability.

Which Variance Test? — Decision Guide

1 sample vs spec

→ χ² Variance Test

Tests if σ² = σ₀² (known target). Uses chi-square distribution. Sensitive to normality.

2 groups, normal data

→ F-Test

Simple, exact, widely understood. Fails with non-normality. Run Shapiro-Wilk first.

2+ groups, any distribution

→ Levene's Test

Robust. Default pre-ANOVA check. Use Brown-Forsythe variant for heavily skewed data.

1 · F-Test for Two Variances

2 normal groups · larger s² on top

When to Use

✓ Two independent samples from normal distributions

✓ Testing if σ₁² = σ₂² (prerequisite before pooled t-test)

✗ Non-normal data — use Levene's Test instead

📌 Always put the larger s² in numerator → right-tail test only

The Formula

F = s₁² / s₂²   (s₁² ≥ s₂²)

df₁ = n₁ − 1 (numerator)    df₂ = n₂ − 1 (denominator)

Symbol	Meaning	Detail
s₁²	Larger sample variance (numerator)	Always put the larger variance on top to ensure F ≥ 1
s₂²	Smaller sample variance (denominator)	Corresponding to group with smaller variance
df₁, df₂	Degrees of freedom for each group	df = n − 1 for each group. Determines which F-distribution to use.
F	Ratio of variances	Under H₀ (equal variances), F ≈ 1. Large F → variances differ significantly.

Engineering Example — Two Moulding Machines

Scenario: Two injection moulding machines produce the same part. Machine A: n=10 parts, s=1.8mm. Machine B: n=8 parts, s=0.9mm. Do the machines have significantly different variation? α=0.05, two-tail (testing either direction).

① Hypotheses

H₀: σ_A²=σ_B²    H₁: σ_A²≠σ_B²

② Put larger variance on top

s_A=1.8mm > s_B=0.9mm
F = 1.8²/0.9² = 3.24/0.81 = 4.00
df₁=9 (Machine A), df₂=7 (Machine B)

③ Critical Value (one-tail α=0.05)

Fcrit(df₁=9, df₂=7, α=0.05) = 3.68
(use α=0.05 one-tail since F always ≥1)

④ Decision

4.00 > 3.68 → Reject H₀
Variances are significantly different

Conclusion: Machine A has significantly more variation (4× variance). Use Welch's t-test (not pooled) for comparing means. Investigate Machine A's process stability.

F(9,7) Distribution — Right-Tail Test

2 · Levene's Test — Robust Equality of Variances

2+ groups · robust · any distribution

When to Use This Test

✓ 2 or more groups — checking equal variances before ANOVA or independent t-test

✓ Data may not be perfectly normal — Levene's is robust to non-normality

📌 The recommended default pre-ANOVA variance check in most engineering contexts

✗ Heavily skewed data with outliers — use Brown-Forsythe (median-based) instead

The Formula

zᵢⱼ = |Yᵢⱼ − Ȳᵢ|   →   Run One-Way ANOVA on z

Significant F on the z values means variances differ    df₁ = k−1    df₂ = N−k

Symbol	Meaning	Detail
Yᵢⱼ	Observation j from group i	Raw measurement value
Ȳᵢ	Mean of group i	The group mean used as centre. Replace with median(Yᵢ) for Brown-Forsythe.
zᵢⱼ	Absolute deviation from group mean	How spread out each observation is from its group centre
ANOVA on z	The test mechanism	If variances differ, the zᵢⱼ values differ systematically between groups — ANOVA detects this

Engineering Example

Scenario: Three injection moulding machines produce the same part. Before running ANOVA on mean dimensions, test if variances are equal. n=5 parts per machine. α=0.05.

① Compute group means

Machine A: Ȳ=12.15, B: Ȳ=15.45, C: Ȳ=10.85

② Compute zᵢⱼ = |Yᵢⱼ − Ȳᵢ|

A: [0.05,0.35,0.35,0.05,0.15]
B: [0.15,0.65,0.35,0.15,0.25]
C: [0.05,0.35,0.35,0.05,0.15]

③ Run ANOVA on z values

FLevene = 1.24    Fcrit(2,12) = 3.89

④ Decision

1.24 < 3.89 → Fail to reject H₀
Variances are equal — pooled ANOVA valid ✓

Why absolute deviations? The variance of a group is the mean squared deviation from the group centre. By taking |Yᵢⱼ−Ȳᵢ| as the new response, we convert "do variances differ?" into "do mean absolute deviations differ?" — a standard ANOVA question.

Interpreting the result: Fail to reject H₀ → equal variances → use pooled ANOVA or pooled t-test. Reject H₀ → use Welch's t-test or Welch's ANOVA.

In Minitab: Levene's runs automatically as part of One-Way ANOVA output. Look for "Test for Equal Variances" in the session window.

3 · Bartlett's Test — Most Powerful When Normal

confirmed normal · χ² statistic · df=k−1

When to Use This Test

✓ 2 or more groups with confirmed normal distributions

✓ Most powerful variance equality test when normality genuinely holds

✗ Non-normal data — breaks down badly; one outlier can create a false positive

✗ Unknown distribution — use Levene's or Brown-Forsythe instead

The Formula

χ² = [(N−k) ln(Sp²) − Σ(nᵢ−1) ln(sᵢ²)] / c

c = 1 + [Σ(1/(nᵢ−1)) − 1/(N−k)] / [3(k−1)]    Sp² = Σ(nᵢ−1)sᵢ²/(N−k)    df = k−1

Symbol	Meaning	Detail
Sp²	Pooled within-group variance	Weighted average of all sᵢ² — the common variance under H₀
c	Bartlett correction factor	Adjusts for unequal group sizes. c ≈ 1 for equal n.
ln(sᵢ²)	Log of each group variance	Taking logs constructs a chi-square statistic from the variance ratios
χ²	Test statistic	Large χ² → group variances spread widely from Sp². df=k−1.

Engineering Example

Scenario: 5 production batches of a polymer, n=8 per batch. Shapiro-Wilk confirms normality in all batches. Test if batch variances are equal before pooled ANOVA. α=0.05.

① Individual variances

s₁²=1.82, s₂²=2.14, s₃²=1.91, s₄²=2.05, s₅²=1.88

② Pooled variance Sp²

Sp² = 7×(1.82+2.14+1.91+2.05+1.88)/35 = 1.960

③ Compute χ²

c = 1.065 (correction factor)
χ² = [35×ln(1.96)−7×Σln(sᵢ²)] / 1.065 = 1.42

④ Decision

χ²crit(4df, α=0.05) = 9.488
1.42 < 9.488 → Fail to reject H₀
Batch variances equal — pooled ANOVA valid ✓

Why log-variance? Taking ln(sᵢ²) linearises the relationship between variance and the chi-square distribution, allowing construction of a valid test statistic through the difference between pooled and individual log-variances.

Critical warning: If even one batch had non-normal data (e.g., contaminated samples creating bimodal shape), Bartlett's would flag it as unequal variances even if the underlying processes were identical. Always run Shapiro-Wilk on each group first.

Rule of thumb:
• Normal, symmetric → Bartlett's (most powerful)
• Unknown/any distribution → Levene's
• Skewed or outliers → Brown-Forsythe

4 · χ² Variance Test — One Sample vs Specification

1 sample · σ² vs target · df=n−1

When to Use This Test

✓ One sample — testing if the process variance meets a specification target σ₀²

✓ Answer: "Does this machine's precision meet the engineering spec?"

📌 Requires normal data — sensitive to non-normality unlike Levene's

✗ Two or more groups — use F-test or Levene's instead

The Formula

χ² = (n−1) × s² / σ₀²

df = n−1    Two-tail: reject if χ² < χ²(df, α/2) or χ² > χ²(df, 1−α/2)

Symbol	Meaning	Detail
n−1	Degrees of freedom	Number of observations minus one
s²	Sample variance	Computed from your n measurements: s²=Σ(xᵢ−x̄)²/(n−1)
σ₀²	Target specification variance	The maximum allowable variance from engineering requirements
χ²	Test statistic	Under H₀(σ²=σ₀²) follows chi-square(df=n−1). Right-skewed — upper tail for "variance too large".

Engineering Example

Scenario: A precision lathe must produce shafts with σ ≤ 0.020mm (σ₀²=0.0004mm²). Sample of n=20 shafts gives s=0.023mm (s²=0.000529mm²). Has the variance exceeded the specification? α=0.05, upper one-tail.

① Hypotheses

H₀: σ²≤0.0004   H₁: σ²>0.0004

② Test Statistic

χ²=(20−1)×0.000529/0.0004
=19×1.3225=25.13

③ Critical Value

χ²crit(19df, upper α=0.05)=30.14

④ Decision

25.13<30.14 → Fail to reject H₀
Cannot confirm σ² exceeds spec at α=0.05.
But s=0.023>0.020 — monitor closely.

Chi-square distribution shape: Right-skewed, bounded at zero. For a two-tail test (is variance exactly equal to target?), two critical values are needed: χ²(df, α/2) for the lower and χ²(df, 1−α/2) for the upper.

Two-tail example:
Lower: χ²(19, 0.025)=8.91
Upper: χ²(19, 0.975)=32.85
Reject H₀ if χ²<8.91 or χ²>32.85

Always confirm normality first using Shapiro-Wilk before applying this test.

5 · Brown-Forsythe Test — Outlier-Resistant Variance Equality

skewed data · median-based · robust

When to Use This Test

✓ 2+ groups — testing equal variances when data is skewed or has outliers

✓ Same procedure as Levene's but uses group median instead of mean — more robust

📌 Recommended over Levene's whenever skewness or extreme values are present

✗ Clean symmetric normal data — standard Levene's or Bartlett's is sufficient

The Formula

zᵢⱼ = |Yᵢⱼ − median(Yᵢ)|   →   One-Way ANOVA on z

Identical to Levene's except median replaces mean as the group centre    df₁=k−1    df₂=N−k

Symbol	Meaning	Detail
median(Yᵢ)	Median of group i	Key difference from Levene's. Median is resistant to outliers; mean is not.
zᵢⱼ	Absolute deviation from group median	With median as centre, outliers contribute only moderately to the z values
ANOVA on z	Same as Levene's step 2	Run one-way ANOVA on the zᵢⱼ deviations. Significant F → unequal variances.

Engineering Example

Scenario: Three paint formulations tested for adhesion (MPa). Data is right-skewed. Test variance equality before ANOVA. α=0.05.

① Compute group medians

Formulation A: median=12.3
B: median=15.6   C: median=11.1

② Compute zᵢⱼ = |Yᵢⱼ − medianᵢ|

A: [0.2,0.5,0.4,0.1,0.3]
B: [0.4,1.2,0.6,0.3,0.5]
C: [0.1,0.2,0.3,0.1,0.2]

③ ANOVA on z values

FBF=3.12   Fcrit(2,12)=3.89

④ Decision

3.12<3.89 → Fail to reject H₀
Variances equal despite skewed data.
Pooled ANOVA appropriate ✓

Why median beats mean here: In right-skewed data, high outliers pull the group mean upward. Deviations from that pulled mean appear large, making Levene's test incorrectly flag unequal variances. The median is unaffected by outliers — the bulk of the data drives the result.

Quick decision guide:
• Normal, symmetric → Bartlett's (most powerful)
• Any distribution, no outliers → Levene's
• Skewed or outliers present → Brown-Forsythe

In software: Minitab and JMP both run Brown-Forsythe as part of the "Test for Equal Variances" output alongside Levene's. Use the B-F result when you see strong skewness.

Family ⑤ — 7 Tests

Non-Parametric Tests

Use when normality is badly violated with small n, data is ordinal (ranked), or outliers distort parametric tests. These tests rank the data instead of using raw values — they lose some power when normality holds, but are robust and honest when it doesn't.

Parametric vs Non-Parametric — When to Switch

Situation	Parametric Test	Non-Parametric Alternative	What It Tests
1 sample or paired, non-normal	1-sample / paired t	Wilcoxon Signed-Rank	Median = target; or median difference = 0
2 independent groups, non-normal	Independent t	Mann-Whitney U	Same distribution / median in both groups
3+ independent groups, non-normal	One-Way ANOVA	Kruskal-Wallis H	Same distribution across all groups
Repeated measures, non-normal	RM-ANOVA	Friedman Test	Same distribution across conditions
Direction of effect only	1-sample t	Sign Test	P(positive change) = 0.5
Monotonic relationship	Pearson r	Spearman ρ / Kendall τ	Rank correlation (not just linear)

1 · Wilcoxon Signed-Rank Test

1 sample or paired · ranks |dᵢ|

When to Use

✓ One sample or paired data — non-normal distribution

✓ Ordinal scale — you can rank data but not assume normal errors

✓ More powerful than Sign Test — uses both sign AND magnitude of differences

✗ Two independent groups — use Mann-Whitney U instead

The Algorithm

W⁺ = Σ ranks of positive differences
W⁻ = Σ ranks of negative differences
T = min(W⁺, W⁻)

Reject H₀ if T ≤ W_critical (from Wilcoxon table)

Step	What to do	Detail
①	Compute differences	dᵢ = Yᵢ − μ₀ (1-sample) or dᵢ = Y₁ᵢ − Y₂ᵢ (paired)
②	Remove zero differences	Drop any dᵢ = 0. Reduce n accordingly.
③	Rank absolute values	Rank \|dᵢ\| from 1 (smallest) to n (largest). Average ranks for ties.
④	Attach original signs	W⁺ = sum of ranks where dᵢ > 0; W⁻ = sum of ranks where dᵢ < 0
⑤	Test statistic T	T = min(W⁺, W⁻). Reject H₀ if T ≤ T_critical from table.

Engineering Example — Hardness Specification Check

Scenario: Target hardness = 50 HRC. 8 hardened steel parts measured. Distribution is unknown/skewed — Shapiro-Wilk suggests non-normality. Test if median = 50 HRC. α=0.05, two-tail.

Part	Y	d=Y−50	\|d\|	Rank	Signed Rank
1	53.2	+3.2	3.2	5.5	+5.5
2	47.8	−2.2	2.2	3	−3
3	55.1	+5.1	5.1	7	+7
4	49.1	−0.9	0.9	1	−1
5	51.8	+1.8	1.8	2	+2
6	56.4	+6.4	6.4	8	+8
7	52.4	+2.4	2.4	4	+4
8	46.8	−3.2	3.2	5.5	−5.5

                          *Tie: parts 1&8 both |d|=3.2 → avg rank (5+6)/2=5.5

                          W⁺ = 5.5+7+2+8+4 = 26.5

                          W⁻ = 3+1+5.5 = 9.5

                          T = min(26.5, 9.5) = 9.5

Decision

T_crit(n=8, α=0.05 two-tail) = 4
9.5 > 4 → Fail to reject H₀
Median consistent with 50 HRC

Signed Ranks — Visual Balance

2 · Mann-Whitney U Test

2 independent groups

✓ Two independent groups, non-normal

✓ Ordinal data or continuous with outliers

📌 Also known as Wilcoxon rank-sum test

✗ Paired data — use Wilcoxon Signed-Rank

U₁ = n₁n₂ + n₁(n₁+1)/2 − W₁
U₂ = n₁n₂ + n₂(n₂+1)/2 − W₂
U = min(U₁, U₂)

W₁ = sum of ranks for group 1 (all obs ranked together)

Symbol	Meaning
W₁	Sum of ranks for group 1 in combined ranking of all n₁+n₂ observations
U	Count of times a group 1 obs precedes a group 2 obs in ranked order. U=0 → perfect separation.

Example — Cycle Times: Old vs New Process

                    Old: 42,51,48,55,49 (n₁=5)  New: 38,44,41,39,43 (n₂=5)

                    Combined rank all 10: New dominates lower ranks

                    W₁(Old)=38, W₂(New)=17

                    U₁=5×5+5×6/2−38=25+15−38=2

                    U₂=5×5+5×6/2−17=25+15−17=23

                    U=min(2,23)=2

                    Ucrit(5,5,α=0.05)=4  2≤4 → Reject H₀

                    New process significantly faster

3 · Kruskal-Wallis H Test

3+ groups · χ²(k−1)

H = [12/N(N+1)] × Σ(Rᵢ²/nᵢ) − 3(N+1)

Rᵢ = sum of ranks for group i   df = k−1

Example — 3 Suppliers, Delivery Time

                      3 suppliers, 5 deliveries each (days): non-normal

                      Rank all 15 combined → R₁=52, R₂=38, R₃=30

                      H=[12/(15×16)]×(52²/5+38²/5+30²/5)−3×16

                      H=[0.05]×(540.8+288.8+180)−48

                      H=50.48−48=2.48

                      χ²crit(2df,0.05)=5.991  2.48<5.991

                      → Fail to reject H₀ — suppliers similar

5 · Sign Test — Simplest Non-Parametric Median Test

1 sample · direction only · binomial

When to Use This Test

✓ Minimal data requirements — only the direction of difference (+ or −) can be recorded

✓ Very small n where even Wilcoxon assumptions may not hold

📌 Based on the binomial distribution — no ranking, no magnitudes required

✗ You can measure magnitude of differences — use Wilcoxon Signed-Rank (more powerful)

The Formula

B = count of + signs    B ~ Binomial(n, 0.5)

Under H₀: P(+) = P(−) = 0.5    Discard zero differences    Use binomial table or exact p-value

Symbol	Meaning	Detail
B	Count of positive differences	dᵢ = Yᵢ − μ₀. Count all dᵢ > 0. Ignore dᵢ = 0.
n	Effective sample size	Total observations minus the number of ties (zeros)
Binomial(n,0.5)	The reference distribution	Under H₀ (median=μ₀), each difference is equally likely to be + or −
p-value	Exact probability	P(B ≥ observed) for upper tail, or two-tail: 2 × min(P(≤b), P(≥b))

Engineering Example

Scenario: A new lubricant is tested on 10 machines. Only direction of change in cycle time (faster/slower) is recorded — not the exact change. Does the lubricant reduce cycle time? α = 0.05, lower one-tail.

① Record signs only

Machine:  1  2  3  4  5  6  7  8  9  10
Change:   −  −  +  −  −  0  −  −  +  −
(0 discarded → n=9)

② Count positives

B = 2 (machines 3 and 9 got slower)
n = 9 (after discarding machine 6)

③ Exact p-value (lower tail)

P(B ≤ 2 | n=9, p=0.5)
= P(0)+P(1)+P(2) = 0.002+0.018+0.070
= 0.090

④ Decision

p = 0.090 > 0.05 → Fail to reject H₀
Insufficient evidence lubricant reduces cycle time
(7 of 9 improved — but not significant at α=0.05)

Why the Sign Test is weak: It throws away all magnitude information. Machine 8 might have improved by 30 seconds and machine 3 might have worsened by 0.1 seconds — the Sign Test treats them identically. This is why Wilcoxon is almost always preferred when you can measure the actual differences.

When the Sign Test is the right choice:
• Only direction was recorded in the data collection
• The comparison is ordinal ("better or worse" with no scale)
• Very small n (n < 6) where even Wilcoxon has almost no power
• Quick screening to confirm direction before a proper study

Power comparison (same data): Sign Test: p=0.090. Wilcoxon Signed-Rank: p≈0.025 (significant). The Sign Test missed a real effect.

6 & 7 · Spearman ρ and Kendall τ — Rank Correlations

monotonic relationship · ordinal · outliers present

When to Use This Test

✓ Relationship between two variables is monotonic but not necessarily linear

✓ Data is ordinal, or continuous but non-normal, or outliers are present

📌 Spearman ρ: non-parametric version of Pearson r — faster to compute, more familiar

📌 Kendall τ: probability interpretation — more meaningful for small n and many ties

The Formula

ρ = 1 − 6Σdᵢ² / [n(n²−1)]

Kendall τ = (C − D) / [n(n−1)/2]    C = concordant pairs    D = discordant pairs

Symbol	Meaning	Detail
dᵢ	Rank difference for pair i	dᵢ = rank(Xᵢ) − rank(Yᵢ) — rank each variable separately then subtract
Σdᵢ²	Sum of squared rank differences	Large Σdᵢ² → ranks are misaligned → low correlation
C	Concordant pairs (Kendall)	Pairs (i,j) where both Xᵢ
D	Discordant pairs (Kendall)	Pairs where X rank order disagrees with Y rank order
τ interpretation	Probability	τ = P(concordant) − P(discordant). τ=0.6 means 60% more concordant than discordant pairs.

Engineering Example

Scenario: An engineer ranks 8 circuit boards by visual quality (1=worst, 8=best) and measures their failure time (hours). Is quality rank correlated with failure time? Non-normal distribution. α = 0.05.

① Data and ranks

Board	Quality Rank X	Failure hr	Rank Y	dᵢ	dᵢ²
A	1	420	2	−1	1
B	2	380	1	1	1
C	3	580	4	−1	1
D	4	620	5	−1	1
E	5	490	3	2	4
F	6	710	6	0	0
G	7	820	7	0	0
H	8	910	8	0	0
Σdᵢ²					8

② Compute ρ

ρ = 1 − 6×8 / [8×(64−1)]
= 1 − 48/504 = 1 − 0.095 = 0.905

③ Test significance (n=8)

t = ρ√(n−2)/√(1−ρ²)
= 0.905×√6/√(1−0.819)
= 0.905×2.449/0.425 = 5.21
tcrit(6df, 0.05) = 2.447

④ Decision

5.21 > 2.447 → Reject H₀
Significant rank correlation (ρ=0.905)
Higher visual quality → longer failure time

Spearman vs Pearson: Pearson r measures linear relationship using raw values. Spearman ρ measures monotonic relationship using ranks. For ordinal data or continuous data with outliers, Spearman is more appropriate — one extreme outlier can dominate Pearson but has only rank ±1 effect on Spearman.

Spearman vs Kendall:
• ρ is more familiar and easier to compute
• τ has a clearer probability interpretation (P(concordant) − P(discordant))
• τ is more appropriate when many ties exist
• For n > 30 with few ties: ρ ≈ (3τ/2) approximately

Significance testing: For n > 10, use the t-statistic shown above. For n ≤ 10, use exact Spearman critical value tables.

Family ⑥ — 8 Tests

Correlation, Regression & Normality Tests

Test relationships between variables (correlation, regression), validate model assumptions (normality, independence of residuals), and compare survival or reliability curves. These tests are prerequisites for and extensions of the parametric means tests in Family ①.

1 · Pearson Correlation (r)

linear relationship · both normal

When to Use

✓ Both variables are continuous and approximately normal

✓ Testing if there is a linear relationship between X and Y

✗ r=0 does NOT mean no relationship — only no linear one. Always plot first.

📌 Non-parametric alternative: Spearman ρ (Family ⑤) for ordinal or non-normal data

The Formula

r = Σ(xᵢ−x̄)(yᵢ−ȳ) / √[Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²]

Test H₀: ρ=0 using   t = r√(n−2) / √(1−r²)   df = n−2

Symbol	Meaning	Detail
r	Sample correlation coefficient	−1 ≤ r ≤ +1. Perfect negative=−1, none=0, perfect positive=+1
Σ(xᵢ−x̄)(yᵢ−ȳ)	Sample covariance (unnormalised)	Measures how X and Y vary together. Positive = both increase together.
r² (R²)	Coefficient of determination	Proportion of variance in Y explained by X. r=0.78 → R²=0.61 (61% explained).
t = r√(n−2)/√(1−r²)	Test statistic for H₀: ρ=0	Follows t-distribution with df=n−2. Use standard t-table.

Engineering Example — Temperature vs Viscosity

Scenario: A polymer process engineer measures melt temperature (°C) and melt viscosity (Pa·s) for n=20 samples. Is there a significant linear correlation? α=0.05.

① Calculate r

r = 0.78 (computed from data)
r² = 0.61 (61% variance explained)

② Test Statistic

t = 0.78×√18 / √(1−0.6084)
= 0.78×4.243 / 0.6258 = 5.29
df = n−2 = 18

③ Critical Value

tcrit(18df, α=0.05, two-tail) = ±2.101

④ Decision

5.29 ≫ 2.101 → Reject H₀
Significant linear correlation

Conclusion: Temperature and viscosity are significantly correlated (r=0.78, p<0.001). 61% of viscosity variation is explained by temperature. Proceed to regression analysis.

t-Distribution Rejection Region (df=18)

2 & 3 · Regression t-Test & Overall F-Test

after building regression model

When to Use Each

Regression t-test: Tests if each individual coefficient β ≠ 0. One t-test per predictor. "Does this variable contribute significantly?"

Overall F-test: Tests if the entire model explains any variance. "Is the model as a whole significant?" Run this first.

📌 A model can be overall significant (F) but have individual non-significant t's — multicollinearity or redundant predictors

⚡ Interpret t-tests only after confirming Overall F is significant

Regression t-Test Formula

t = b̂ⱼ / SE(b̂ⱼ)

df = n − p − 1    (p = number of predictors)

Overall F-Test Formula

F = MS_Regression / MS_Residual

df₁ = p    df₂ = n − p − 1

Symbol	Meaning	Detail
b̂ⱼ	Estimated regression coefficient for predictor j	How much Y changes per unit increase in Xⱼ, holding others constant
SE(b̂ⱼ)	Standard error of the coefficient estimate	Uncertainty in b̂ⱼ. From regression output (covariance matrix of estimates).
MS_Regression	Mean square explained by the model	SS_Regression / p
MS_Residual	Mean square unexplained (error)	SS_Residual / (n−p−1)

Example: Cycle time (sec) regressed on Temperature (X₁) and Pressure (X₂). n=25 observations. Two predictors (p=2).

                      Overall F-test:

                      F = 18.7  df=(2,22)

                      Fcrit(2,22)=3.44

                      18.7>3.44 → Model significant ✓

                      R² = 0.63 (63% explained)

                      Coefficient t-tests (df=22):

                      b̂₁=2.34, SE=0.61 → t=3.84 ✓ Sig.

                      b̂₂=0.12, SE=0.19 → t=0.63 ✗ Not sig.

                      → Remove X₂ (Pressure) — not contributing

                      → Refit model with X₁ only

Normality Tests (4–6) — Prerequisite for Families ① and ④

4 · Shapiro-Wilk Test — Best Normality Test for Small n

H₀: data is normal · n < 50 · W statistic

When to Use This Test

✓ Testing if a dataset follows a normal distribution — n < 50

✓ Most powerful normality test for small samples — default when n is limited

📌 W close to 1 → consistent with normality. Reject H₀ (W small) → non-normal → use Family ⑤

✗ n > 50 — Anderson-Darling is preferred for larger samples

The Formula

W = (Σ aᵢ x₍ᵢ₎)² / Σ(xᵢ − x̄)²

x₍ᵢ₎ = ordered observations (x-order statistics)    aᵢ = expected normal order statistic coefficients    0 < W ≤ 1

Symbol	Meaning	Detail
x₍ᵢ₎	Order statistics	Your n observations sorted smallest to largest: x₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x₍ₙ₎
aᵢ	Expected normal order statistic coefficients	Tabulated constants. For n=5: a₁=0.6646, a₂=0.2413. Available in Shapiro-Wilk tables.
Numerator (Σaᵢx₍ᵢ₎)²	Weighted linear combination	Measures how well the ordered data matches the expected pattern of a normal distribution
Denominator Σ(xᵢ−x̄)²	Total sum of squares	Unnormalised sample variance. Ratio W = 1 iff data is perfectly normal.

Engineering Example

Scenario: A quality engineer collects n=12 bearing diameter measurements before running a t-test. First, confirm the data is approximately normal. α = 0.05.

① Data (n=12) and H₀

H₀: data is from a normal distribution
H₁: data is not normal
Data: 25.1,24.8,25.3,25.0,24.9,25.2,
25.1,25.0,24.7,25.4,25.2,24.9 mm

② Sort and apply aᵢ coefficients

x₍₁₎=24.7, x₍₂₎=24.8, ..., x₍₁₂₎=25.4
a₁=0.5475, a₂=0.3325, ... (from table)
Σaᵢx₍ᵢ₎ = 0.614 (computed)

③ Compute W

W = 0.614² / Σ(xᵢ−x̄)²
= 0.377 / 0.396 = 0.952

④ Decision

Wcrit(n=12, α=0.05) = 0.859
0.952 > 0.859 → Fail to reject H₀
Data consistent with normality ✓
t-test is appropriate

What W actually measures: W is the ratio of the best linear unbiased estimate of σ² (using the order statistics) to the ordinary sample variance. If the data is truly normal, these two estimates should agree closely, giving W ≈ 1. Non-normal data creates a mismatch: W drops below 1.

When to reject: Reject H₀ when W < W_critical. The critical values come from Shapiro-Wilk tables (n from 3 to 50). In software, use the p-value: reject if p < 0.05.

Practical rule of thumb:
• W > 0.95: strong evidence of normality
• 0.90 < W < 0.95: minor non-normality — t-test usually robust
• W < 0.90: significant non-normality — use non-parametric test

In software: Minitab: Stat → Basic Statistics → Normality Test. R: shapiro.test(x).

5 · Kolmogorov-Smirnov Test — CDF Comparison

empirical vs theoretical CDF · or 2-sample

When to Use This Test

✓ One-sample: compare empirical distribution to any fully specified theoretical distribution

✓ Two-sample: compare two unknown distributions to each other — no normality assumed

📌 The two-sample K-S is a general-purpose distribution equality test — works for any shape

✗ Less powerful than Anderson-Darling at the tails — use A-D for reliability/lifetime data

The Formula

D = sup |Fₙ(x) − F₀(x)|

Fₙ(x) = empirical CDF = proportion of observations ≤ x    F₀(x) = theoretical CDF    D = maximum absolute gap

Symbol	Meaning	Detail
Fₙ(x)	Empirical CDF	Step function: Fₙ(x) = (number of observations ≤ x) / n. Jumps by 1/n at each data point.
F₀(x)	Theoretical CDF	The distribution you are testing against (Normal, Weibull, etc.). Must be fully specified — mean and σ known.
D	K-S statistic	Maximum vertical distance between Fₙ and F₀ anywhere on the x-axis. Larger D = greater departure.
sup	Supremum	The maximum value over all x — the worst-case discrepancy between empirical and theoretical CDF

Engineering Example

Scenario: n=30 tensile strength measurements. Test if the data follows a Normal(μ=480, σ=22) distribution. α=0.05.

① Build empirical CDF

Sort n=30 observations.
At each xᵢ: Fₙ(xᵢ) = i/30
E.g., 5th value x₍₅₎=455: Fₙ=5/30=0.167

② Compare to Normal CDF

F₀(455) = Φ((455−480)/22) = Φ(−1.14) = 0.127
|Fₙ(455)−F₀(455)| = |0.167−0.127| = 0.040

③ Find maximum gap D

Compute |Fₙ−F₀| at every data point.
D = max of all these differences = 0.121

④ Decision

Dcrit(n=30, α=0.05) = 0.242
0.121 < 0.242 → Fail to reject H₀
Data consistent with N(480, 22²)

Visual intuition: Plot the step function of your sorted data (empirical CDF) alongside the smooth S-curve of the theoretical CDF. D is the largest vertical gap between the two. If this gap exceeds the critical value, the distributions are significantly different.

Two-sample K-S: Instead of comparing to a theoretical F₀, compare two empirical CDFs: D = sup|Fₙ₁(x) − Fₙ₂(x)|. This tests whether two samples come from the same distribution — no assumptions about what that distribution is. Useful for comparing before/after distributions of a process change.

Important limitation: The parameters (μ=480, σ=22) must be specified independently — not estimated from the same data. If you estimate them from data and then test goodness-of-fit, the K-S critical values are no longer correct. Use Lilliefors correction in that case.

6 · Anderson-Darling Test — Tail-Sensitive Normality

weights tails · reliability data · Minitab default

When to Use This Test

✓ Testing normality when the distribution tails are important (reliability, extreme events)

✓ Medium to large n (50–200+) where A-D is more powerful than Shapiro-Wilk

📌 Can also test Weibull, exponential, lognormal — not just normal distributions

📌 Default normality test in Minitab — what you see in the "Normality Test" output

The Formula

A² = −n − (1/n) Σ(2i−1)[ln F(x₍ᵢ₎) + ln(1−F(x₍ₙ₊₁₋ᵢ₎))]

x₍ᵢ₎ = sorted observations    F = CDF of the distribution under H₀    Smaller A² = better fit

Symbol	Meaning	Detail
x₍ᵢ₎	Sorted observations	x₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x₍ₙ₎ — same order statistics used in Shapiro-Wilk
F(x₍ᵢ₎)	CDF at order statistic i	For normality test: F(x) = Φ((x−x̄)/s) — evaluated at each data point
(2i−1)	Weight function	Gives extra weight to the i=1 and i=n terms — the tails. This is why A-D is more sensitive at tails than K-S.
A²	Anderson-Darling statistic	Reject H₀ if A² exceeds the critical value for the chosen distribution and α

Engineering Example

Scenario: 25 component lifetime measurements (hours) from an ALT study. Test if lifetimes follow a normal distribution before applying parametric analysis. α = 0.05.

① Sort data and compute Fₙ(x₍ᵢ₎)

Sort n=25 lifetimes, compute x̄ and s.
For each x₍ᵢ₎: F(x₍ᵢ₎) = Φ((x₍ᵢ₎−x̄)/s)

② Apply weighted sum formula

For each i=1..25:
Term_i = (2i−1)[lnF(x₍ᵢ₎)+ln(1−F(x₍₂₆₋ᵢ₎))]
A² = −25 − (1/25)×ΣTerm_i = 0.412

③ Apply correction for estimated parameters

A²* = A²(1 + 4/n − 25/n²)
= 0.412(1+0.16−0.04) = 0.412×1.12 = 0.462

④ Decision

A²crit(α=0.05, normal) = 0.752
0.462 < 0.752 → Fail to reject H₀
Data consistent with normal distribution
p-value ≈ 0.24

Why tail-weighting matters: The term (2i−1) means the first and last observations (the extremes) receive the most weight in the A² sum. This makes Anderson-Darling particularly sensitive to departures from normality in the tails — which is precisely where reliability data is most likely to deviate (early failures, wear-out tails).

A-D for other distributions: Replace F(x) with the Weibull, lognormal, or exponential CDF. Minitab's "Individual Distribution Identification" runs A-D for 14 distributions simultaneously and shows which fits best. For reliability engineers, the Weibull A-D test is often the first step.

A-D vs Shapiro-Wilk: Both good normality tests. Use S-W for n < 50 (more powerful). Use A-D for n > 50 or when testing non-normal distributions. In practice, run both and look at the probability plots — the visual always supplements the test.

Correlation, Regression & Time Series

Pearson Correlation Coefficient (r)

r value	Interpretation
r = +1	Perfect positive linear relationship
0 < r < 1	Positive correlation (as X increases, Y increases)
r = 0	No linear relationship
−1 < r < 0	Negative correlation (as X increases, Y decreases)
r = −1	Perfect negative linear relationship

⚠️ Correlation ≠ Causation

A strong correlation between X and Y does not mean X causes Y. Both may be driven by a third variable (confounding). Example: ice cream sales and drowning rates are positively correlated — both caused by hot weather.

Coefficient of Determination r²

r² = proportion of variance in Y explained by X (0 to 1). If r=0.88 → r²=0.77 → 77% of variance in Y is explained by X. Remaining 23% is unexplained.

Fisher's Z Transformation — CI for Correlation

Since r is not normally distributed, a 3-step process is needed to find CI for the population correlation ρ:

Convert r to z' (Fisher's transformation): z' = 0.5·[ln(1+r) − ln(1−r)]
Build CI in z' space: SE = 1/√(N−3), then z'±z_α/2·SE
Back-transform CI limits from z' to r

Worked Example: N=10, r=0.88, 95% CI

Step 1: z' = 0.5[ln(1.88)−ln(0.12)] = 1.375
Step 2: SE = 1/√(10−3) = 0.378
        CI = 1.375 ± 1.96×0.378
        z' range: 0.635 to 2.11
Step 3: Back-transform → r: 0.56 to 0.97

Regression & Time Series — Strongest Upgrade (Real Data + Visual Learning)

NIST-aligned learning system

Use the graph to understand the data structure first. Then model. Then diagnose. Then decide.

This upgrade follows the NIST/SEMATECH engineering-statistics philosophy: graphics are not decoration, and modeling should never be separated from diagnostics. For regression, that means fit + residuals + structure checks. For time series, that means trend + seasonality + dependence before forecasting.

Real-data example: Anscombe Data Set I

NIST uses Anscombe's example to show why graphics are essential. We start with Data Set I, which behaves approximately linearly and is appropriate for a simple linear regression. The model is Y = β₀ + β₁X + ε. Least squares chooses the line that minimizes the sum of squared residuals.

X: 10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5 Y₁: 8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68

What users should learn from this example

Slope: change in Y for one-unit change in X.
Intercept: fitted Y when X = 0.
R²: how much of the Y variation is explained by X.
Residuals: the model's errors — the real diagnostic layer.

Equation

ŷ = 3.00 + 0.50x

The slope tells the engineering effect size.

Correlation

r = 0.816

A strong positive association exists, but correlation alone is never enough.

R²

0.667

Explained variation, not proof that the model is correct.

Fit + Residual Diagnostics in One View

This is how to teach regression properly: not just the line, but the line plus its mistakes. NIST emphasizes residual analysis because the line by itself can be misleading.

How to explain it: the left panel answers “what line fits the data?” The right panel answers “are the residuals random enough that the linear model is reasonable?” A good model shows residuals centered around zero without curvature, fanning, or a strong trend.

When simple linear regression is appropriate

One response, one predictor, approximately linear relationship, no strong time-order dependence, and residual variation that is roughly constant.

What to check before trusting the model

Scatter plot shape, residual plot, unusual points, leverage/influence, and whether the physics actually supports a straight-line relationship.

Real-data example: Anscombe's Quartet

NIST uses Anscombe's quartet to prove a crucial lesson: four data sets can have nearly identical summary statistics and regression results, yet have completely different structures. That means numbers alone can hide the truth.

Why this belongs in your site

Users immediately understand why plots matter.
It prevents blind trust in slope, r, and R².
It visually explains linearity, outliers, curvature, and leverage.

Same Statistics. Different Reality.

All four data sets have nearly the same mean, slope, intercept, and correlation, but the scatter plots tell completely different stories.

Teaching message: Data set I is approximately linear. Data set II is curved. Data set III has an outlier. Data set IV is dominated by one influential point. This is exactly why the NIST handbook treats exploratory graphics as essential, not optional.

What graphs reveal that summary statistics hide

Curvature, clusters, outliers, leverage points, unequal spread, and poor experimental design.

Best practice to teach users

Always look at the scatter plot first, then fit the model, then inspect residuals. Never reverse that order.

Real-data example: NIST monthly CO₂ concentrations

The NIST handbook uses monthly CO₂ concentrations from Mauna Loa as a sample time-series data set. Time-series data must be treated differently from ordinary regression data because the observations are ordered in time and can have trend, seasonality, and autocorrelation.

CO₂ (1974–1977 subset): 333.13, 332.09, 331.10, 329.14, 327.36, 327.29, 328.23, 329.55, 330.62, 331.40, ...

What users should learn from this example

Trend: the long-term level is rising.
Seasonality: there is a repeating annual pattern.
Smoothing: moving averages reveal the underlying path.
Modeling rule: identify the structure before forecasting.

Run Sequence View — Trend + Seasonality

This real NIST sample shows why time-series analysis exists: the data are not just a cloud of independent points. The 12-point moving average helps reveal the underlying level.

How to explain it: the raw line has short-term variation, but the bigger story is a rising long-term level with a repeating annual cycle. If users fit a simple straight line and ignore the seasonal pattern, they will miss important structure.

Seasonal Subseries View — See the Repeating Cycle Directly

NIST highlights seasonal subseries plots as a tool for detecting seasonality when the period is known. For monthly data, the period is usually 12.

Teaching message: this view makes the seasonality obvious. In this CO₂ subset, the series peaks around May and falls through late summer/early autumn. That repeating structure is exactly what seasonal methods are designed to capture.

Time-series workflow users should remember

Plot the series, check for trend, check for seasonality, check for dependence, smooth only to reveal structure, then choose a forecasting method.

When not to use ordinary regression alone

When data are collected over time and adjacent observations are related. Independence is no longer a safe assumption.

Reliability Engineering

Quantitative methods for predicting, measuring, and improving product reliability — from MTBF calculations to Weibull analysis and system configuration modeling.

Core Reliability Metrics

Six numbers tell the complete reliability story of any system. Understanding how they connect — and what levers you pull to improve each — is the foundation of reliability engineering.

How the Metrics Connect — Follow the Chain

Observed Data

Failures, Time

Count & hours

→

Failure Rate

λ = F / T

failures/hr

→

MTBF

1 / λ

mean hrs between failures

MTTR

Repair÷Failures

mean hrs to repair

→

Availability

MTBF/(MTBF+MTTR)

fraction time operational

FIT Rate

λ × 10⁹

failures per billion hr

R(t) — Reliability

e−λt

prob. working at time t

📋 Worked Example — Industrial Pump System

A fleet of 10 pumps operated for 50,000 hours total. During this period, 5 failures were recorded with a total repair time of 20 hours.

λ

= 5 ÷ 50,000 = 0.0001 failures/hr

MTBF

= 50,000 ÷ 5 = 10,000 hr

MTTR

= 20 ÷ 5 = 4 hr per repair

A

= 10,000 ÷ (10,000+4) = 99.96%

FIT

= 0.0001 × 10⁹ = 100,000 FIT

R(2,000)

= e^{−2000/10000} = 81.9%

R(t) — Reliability Decay over Time (MTBF = 10,000 hr)

The Four Fundamental Functions — How They Derive from Each Other

Every reliability distribution is built from a single starting point: the probability density function f(t). All other functions follow by integration or differentiation. This is the NIST 8.1.6 framework — not four separate formulas, but one coherent system.

① f(t) — Failure Density (PDF)

f(t) ≥ 0,  ∫₀^∞ f(t) dt = 1

Probability of failure in the instant [t, t+dt]

→

integrate
0 to t

② F(t) — Cumulative Failure (CDF)

F(t) = ∫₀ᵗ f(u) du
F(0) = 0,  F(∞) = 1

Fraction of population that has failed by time t

↕

f(t) = h(t)·R(t)

R(t)
= 1−F(t)

↕

1 − F(t)

④ h(t) — Hazard Rate

h(t) = f(t) / R(t)
= −d[ln R(t)]/dt

Instantaneous failure risk, given survival to t. Integrates to H(t).

←

R(t) =
exp[−H(t)]

③ R(t) — Reliability / Survival

R(t) = 1 − F(t)
= ∫ₜ^∞ f(u) du
= exp[−H(t)]

Fraction of population surviving beyond time t

Master Formula

R(t) = exp[−H(t)] = exp[−∫₀ᵗ h(u) du]

Every reliability distribution is fully specified by its hazard function h(t). The shape of h(t) determines the failure behaviour — decreasing, constant, or increasing — which maps directly to the three phases of the bathtub curve.

Hazard Rate h(t) — Three Shapes, Three Stories

The hazard function h(t) is the most informative reliability curve. Its shape tells you what kind of failure mechanism is at work and what action to take.

Decreasing h(t) — DFR

Early-life / Infant Mortality

When: Manufacturing defects, poor welds, wrong parts. Failures happen early and then rate drops.
Action: Burn-in testing, incoming inspection, supplier qualification.

Constant h(t) — CFR

Useful Life (Random Failures)

When: Random external events, human error, overstress. Failures don't depend on age (memoryless).
Action: MTBF tracking, redundancy design, maintenance intervals.

Increasing h(t) — IFR

Wear-out / End of Life

When: Fatigue, corrosion, mechanical wear, degradation with use. Older = more likely to fail.
Action: Preventive maintenance schedules, replacement before B10 life.

Key Distributions — Formula Sets

Two distributions cover the majority of reliability engineering problems. Know their hazard shapes and when to use each.

Exponential Distribution — h(t) = λ (constant)

      f(t) = λ·e−λt

      F(t) = 1 − e−λt

      R(t) = e−λt

      h(t) = λ   (memoryless)

MTTF = 1/λ · Use for: electronic components in useful life, random failure events

Weibull Distribution — h(t) = (β/η)(t/η)^β−1

      R(t) = e−(t/η)β

      F(t) = 1 − e−(t/η)β

      h(t) = (β/η)(t/η)β−1

      MTTF = η·Γ(1+1/β)

β: shape · η: characteristic life · Use for: bearings, fatigue, wear-out — any failure phase

Quick Reference — Model Selection Guide

Model	R(t) Formula	h(t) Shape	β (Weibull)	Use When	Typical Applications
Exponential	e^−λt	Constant ─	β = 1	Random, memoryless failures	Electronics, software, random events
Weibull (β<1)	e^{−(t/η)^β}	Decreasing ↘	0.5–0.9	Infant mortality, manufacturing defects	Early field failures, weld defects
Weibull (β>1)	e^{−(t/η)^β}	Increasing ↗	2–4 typical	Wear-out, fatigue, ageing	Bearings, tyres, mechanical wear
Lognormal	1−Φ[(ln t−µ)/σ]	Peaks then drops	—	Fatigue crack propagation, corrosion	Metals fatigue, semiconductor oxide
Normal	1−Φ[(t−µ)/σ]	Increasing ↗	—	Tight wear-out with known life	Light bulbs, precision wear mechanisms
Gamma	1−I(λt, k)	Varies with k	—	Systems requiring k failures to fail	Standby redundancy, shock models

💡

Which model to choose? Plot your data on Weibull probability paper first. If it falls on a straight line, Weibull fits. If the β you estimate is 1.0, use the simpler exponential. Only choose lognormal or normal when engineering knowledge of the failure mechanism supports it.

The Bathtub Curve — Failure Rate Over Product Lifetime

The bathtub curve describes how the failure rate λ(t) changes across a product's life. Three distinct phases require different engineering strategies.

Failure Rate λ(t) vs Time — The Classic Bathtub Curve

Phase 1 — Infant Mortality

Decreasing Failure Rate

High initial failure rate that falls rapidly. Caused by manufacturing defects, design weaknesses, and substandard components.

Burn-in / ESS testing
Process improvement (SPC)
Incoming inspection

Phase 2 — Useful Life

Constant Failure Rate

Low, approximately constant random failure rate. MTBF = 1/λ applies here. Normal operating life of the product.

Exponential distribution (β=1)
Preventive maintenance
Redundancy design

Phase 3 — Wear-Out

Increasing Failure Rate

Failure rate rises as components age, fatigue, or corrode. Planned maintenance replaces components before this phase starts.

Predictive maintenance
Scheduled replacement (B10)
Weibull β > 1

Weibull Analysis — The Universal Reliability Distribution

The Weibull distribution models all three bathtub phases by adjusting a single parameter β. It's the most widely used distribution in reliability engineering.

Reliability Function

R(t) = exp[ −(t/η)β ]

Probability of surviving to time t

Cumulative Failure

F(t) = 1 − exp[ −(t/η)β ]

Fraction failed by time t

Hazard Rate

h(t) = (β/η) × (t/η)β−1

Instantaneous failure rate at time t

Mean Time to Failure

MTTF = η × Γ(1 + 1/β)

Expected life; Γ = gamma function

B10 Life

B10 = η × (−ln 0.90)1/β

Time at which 10% of units have failed

📊 Weibull Hazard Rate h(t) for Different β Values

Interpreting β

β < 1

Infant mortality. Failure rate decreasing. Manufacturing or design defects. Burn-in recommended.

β = 1

Constant random failures. Useful-life phase. Exponential distribution. MTBF = η.

β = 2

Early wear-out. Linearly increasing hazard. Ball bearings, seals, O-rings.

β ≈ 3.5

Normal-like wear-out. Common for mechanical fatigue, gears, springs. Symmetric failure distribution.

📌

Characteristic Life η: Always the time at which 63.2% of units fail, regardless of β. F(η) = 1 − e⁻¹ = 0.632. On a Weibull probability plot, η is where the fitted line crosses the 63.2% horizontal.

Generalised Bx Life — Beyond B10

B10 is the automotive standard, but any Bx life (the time by which x% of units have failed) can be computed directly from the Weibull parameters. This is the NIST-standard approach (NIST 8.2.2).

General Bx Formula — Time at Which x% Have Failed

B_x = η · [−ln(1 − x/100)]^(1/β)

B1 Life (1% failure)

η · [−ln(0.99)]^(1/β)
= η · (0.01005)^(1/β)

B10 Life (10% failure)

η · [−ln(0.90)]^(1/β)
= η · (0.10536)^(1/β)

B50 Life (50% failure)

η · [−ln(0.50)]^(1/β)
= η · (0.69315)^(1/β)

Worked Example — η = 8,000 hr, β = 2.5

B1  = 8000 · (0.01005)^(1/2.5) = 8000 · 0.1096 = 877 hr
B10 = 8000 · (0.10536)^(1/2.5) = 8000 · 0.2347 = 1,878 hr
B50 = 8000 · (0.69315)^(1/2.5) = 8000 · 0.7917 = 6,334 hr

Weibull Probability Plotting — Rank Regression Method (NIST 8.2.2)

The Weibull probability plot linearises the Weibull CDF so failure data falls on a straight line — slope gives β, x-intercept at 63.2% gives η. The step-by-step NIST procedure:

Step 1 — Rank the Failure Times

Order n failures as t₁ < t₂ < … < tₙ. Assign median rank (Benard's approximation):

F̂ᵢ = (i − 0.3) / (n + 0.4)

More accurate than i/n for small samples. NIST-recommended. Also used in ReliaSoft Weibull++.

Step 2 — Linearise the CDF

Take double log of both sides of R(t) = e^(−(t/η)^β):

ln[ln(1/(1−F))] = β·ln(t) − β·ln(η)
Y = β·X − β·ln(η)

Y = ln[ln(1/(1−F))], X = ln(t). Plot Y vs X — should be linear for Weibull.

Step 3 — Fit & Extract Parameters

Fit straight line to (ln(tᵢ), ln[ln(1/(1−F̂ᵢ))]) by least squares:

β̂ = slope of fitted line
η̂ = exp(−intercept / β̂)

Or: read η directly where the line crosses F = 63.2% on the Weibull paper.

Worked Example — 5 Bearings

Failures at: 850, 1100, 1350, 1600, 2100 hr
n = 5, Benard ranks:
i=1: F̂ = 0.70/5.4 = 0.130
i=2: F̂ = 1.70/5.4 = 0.315
i=3: F̂ = 2.70/5.4 = 0.500
i=4: F̂ = 3.70/5.4 = 0.685
i=5: F̂ = 4.70/5.4 = 0.870
→ Plot, fit line → β̂ ≈ 2.1, η̂ ≈ 1,580 hr

📈 Weibull Quick Ref

η (Characteristic Life)
63.2% of units fail by η. Always true regardless of β.
B10 Life
Time by which 10% of units fail. Standard bearing and automotive spec metric.
Weibull Probability Plot
Plot ln(ln(1/(1−F))) vs ln(t). Slope = β. Intercept gives η. Straight line confirms Weibull fit.
Random Number Gen.
x = σ(−ln ξ)^(1/η) where ξ ~ Uniform(0,1). From the Stockholm Distributions Handbook.

Series vs Parallel Systems

📊 Series (all must work) vs Parallel (at least one must work)

Series System — ALL must work

Rsys = R₁ × R₂ × R₃ × … × Rₙ

Any single failure kills the system. Reliability always lower than weakest component.

Parallel System — ANY one works

Rsys = 1 − (1−R₁)(1−R₂)…(1−Rₙ)

Redundancy. All must fail for system failure. Higher reliability than best component.

💡

Design implication: Critical single-point failures (no redundancy = series) dramatically reduce system reliability. Adding even one parallel backup on a 0.9 R component raises it from 0.9 to 0.99 — a 10× reliability improvement for that subsystem.

Probability Foundations for Reliability — NIST 8.1.6

Reliability is fundamentally a probability — the probability that a device performs its intended function during a specified period under stated conditions. The four-function framework below is the mathematical backbone of all reliability analysis, per NIST Engineering Statistics Handbook Section 8.1.6.

The Four Functions — Complete Derivation Chain

NIST 8.1.6 — Mathematical Relationships Between Reliability Functions

① Probability Density Function f(t)

Definition: f(t) = dF(t)/dt
Requirements: f(t) ≥ 0, ∫₀^∞ f(t)dt = 1
Meaning: instantaneous failure rate density

② Cumulative Distribution Function F(t)

F(t) = P(T ≤ t) = ∫₀ᵗ f(u)du
F(0) = 0,  lim F(t) = 1 as t→∞
Meaning: fraction failed by time t

③ Reliability (Survival) Function R(t)

R(t) = 1 − F(t) = P(T > t)
= ∫ₜ^∞ f(u)du
Meaning: probability of surviving beyond t

④ Hazard Function h(t) — the Key Function

h(t) = f(t) / R(t)
= −d[ln R(t)] / dt
Meaning: conditional failure rate at time t
given survival to t

Master Formula — Integrating the Hazard Function

H(t) = ∫₀ᵗ h(u)du  [Cumulative Hazard Function]
R(t) = exp[−H(t)]  ←  universally valid
f(t) = h(t)·R(t) = h(t)·exp[−H(t)]
MTTF = ∫₀^∞ R(t)dt = E[T]

Five Distributions — h(t), R(t), F(t), f(t) Side-by-Side

Distribution	h(t) Hazard	R(t) Reliability	F(t) CDF	MTTF	Shape
Exponential	λ (constant)	e^(−λt)	1 − e^(−λt)	1/λ	Flat — useful life, β=1
Weibull	(β/η)(t/η)^(β−1)	exp[−(t/η)^β]	1−exp[−(t/η)^β]	η·Γ(1+1/β)	Power — all phases
Lognormal	φ(z)/[σt·Φ(−z)] z=(ln t−µ)/σ	1 − Φ[(ln t−µ)/σ]	Φ[(ln t−µ)/σ]	exp(µ+σ²/2)	IFR then DFR — fatigue, corrosion
Normal	φ(z)/[1−Φ(z)] z=(t−µ)/σ	1 − Φ[(t−µ)/σ]	Φ[(t−µ)/σ]	µ	IFR — tight wear-out
Gamma	Complex — see NIST 8.1.9	1 − I(t/β, k) incomplete gamma	I(t/β, k)	kβ	k<1: DFR, k=1: Exp, k>1: IFR

Hazard Function Shapes — The Physical Meaning

DFR — Decreasing

dh/dt < 0

Failure rate decreases with time. Indicates infant mortality — early failures remove weak units. Example: Weibull β < 1, Gamma k < 1.

CFR — Constant

dh/dt = 0

Constant failure rate. Memoryless — age does not affect remaining life. Exponential distribution. Example: electronic components in useful life.

IFR — Increasing

dh/dt > 0

Failure rate increases — component ages and wears out. Weibull β > 1, Normal, most mechanical components under fatigue and corrosion.

Bathtub (Mixed)

DFR → CFR → IFR

Real-world products combine all three phases. The lognormal has a unimodal hazard — rises then falls. Mixed Weibull populations generate bathtub curves.

Types of Events — Probability Rules

Mutually Exclusive

Cannot occur simultaneously.

P(A ∩ B) = 0

Independent Events

A's occurrence doesn't affect P(B).

P(B|A) = P(B)

Complementary

A' is the event that A does NOT occur.

P(A') = 1 − P(A)

Rule of Addition — Union (A or B)

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Mutually exclusive: P(A ∪ B) = P(A) + P(B)

Rule of Multiplication — Intersection (A and B)

P(A ∩ B) = P(A) × P(B|A)

Independent events: P(A ∩ B) = P(A) × P(B)

MTBF Worked Examples

Example 1 — MTTF vs MTBF

100 items tested for 10,000 hours. 5 items failed at 5,000 hours.

MTTF = (95×10,000 + 5×5,000) / 100
= 975,000 / 100 = 9,750 hrs

MTBF = (95×10,000 + 5×5,000) / 5
= 975,000 / 5 = 195,000 hrs

💡

MTTF divides by total units (100); MTBF divides by failed units only (5)

Example 2 — Hazard Function Derivation

Exponential distribution with λ = 0.001 failures/hr. Find h(t), R(t) at t = 500 hr:

f(t) = 0.001·e^(−0.001t)
F(t) = 1 − e^(−0.001t)
R(t) = e^(−0.001t)
h(t) = f(t)/R(t) = λ = 0.001 (constant)

R(500) = e^(−0.5) = 0.6065 = 60.65%

💡

Exponential h(t) = λ always — this is the memoryless property

Fault Tree Analysis — Top-Down Deductive Reliability

Fault Tree Analysis (FTA) is a top-down, deductive technique that models how a defined system failure (the top event) can occur through combinations of component failures and human errors. It uses Boolean logic gates to trace failure pathways. Foundational to MIT 22.38 and MIL-STD-1629A. Complement to FMEA: FTA asks "what combinations of events cause this failure?" while FMEA asks "what does each component failure cause?"

Academic foundation: MIT 22.38 (Prof. Golay) — Section I: Event Sequence Identification & Section XII: Probabilistic Risk Assessment · Rausand & Høyland, System Reliability Theory 2nd Ed. (Wiley, 2003) · MIL-STD-1629A FMECA procedures

The Logic Gates — Boolean Building Blocks

AND Gate

All inputs must fail

Output event occurs only if all input events occur simultaneously. Represents redundancy — protective when components are independent.

P(T) = P(A) × P(B) × P(C)

Valid only when A, B, C are independent

OR Gate

Any input causes failure

Output event occurs if at least one input event occurs. Most common gate — represents that any single failure propagates upward.

P(T) = 1 − (1−P(A))(1−P(B))(1−P(C))

Exact for independent events

Basic Event

Leaf node — no further decomposition

The lowest-level failure event in the tree. Has an assigned failure probability λ (from field data, MIL-HDBK-217F, or manufacturer's data).

P(event) = 1 − e−λt ≈ λt for small λt

Undeveloped Event

Not further analysed

An event not developed further — either insufficient data, or judged insufficiently important. Marked explicitly so reviewers know it was a conscious decision.

The FTA Process — 6 Steps

1

Define the Top Event precisely

State the exact undesired event: not "pump fails" but "pump fails to deliver ≥10 L/min at system pressure within 5 sec of demand signal." Ambiguity at this step invalidates everything below.

2

Define system boundaries and assumptions

What is in scope? What interfaces are excluded? What is the mission time? What is the operating environment? All stated explicitly.

3

Construct the tree top-down using Boolean gates

Decompose the top event into immediate causes connected by AND/OR gates. Continue decomposing each intermediate event until basic events are reached. Never skip levels.

4

Identify Minimal Cut Sets (MCS)

A cut set is a set of basic events whose simultaneous occurrence causes the top event. A minimal cut set cannot be reduced further — removing any element prevents the top event. These are the system's vulnerabilities.

5

Quantify — assign failure probabilities

Assign P(failure) to each basic event from field data, MIL-HDBK-217F, OREDA, or manufacturer specs. Propagate probabilities upward through gates.

6

Evaluate importance measures — prioritise action

Use Birnbaum importance, Fussell-Vesely importance, and Risk Reduction Worth (RRW) to rank which basic events most affect top-event probability. Focus design improvements on high-importance events.

Minimal Cut Sets — The Mathematics

For a system with minimal cut sets K₁, K₂, …, Kₘ, the top event T occurs if any cut set occurs completely. Using the inclusion-exclusion principle:

Exact (inclusion-exclusion)

              P(T) = Σ P(Kᵢ) − Σ P(Kᵢ ∩ Kⱼ)

                     + Σ P(Kᵢ ∩ Kⱼ ∩ Kₖ) − …

Rare Event Approximation (λt ≪ 1)

P(T) ≈ Σᵢ P(Kᵢ) = Σᵢ ∏ⱼ∈Kᵢ qⱼ

where qⱼ = probability of basic event j. Valid when P(T) ≪ 1.

Worked Example — Pump Failure System

Top event: Loss of cooling water
Minimal cut sets: K₁ = {A,B}, K₂ = {C}, K₃ = {A,D}

              q_A = 0.01  q_B = 0.02

              q_C = 0.005  q_D = 0.03

              P(K₁) = 0.01 × 0.02 = 2×10⁻⁴

              P(K₂) = 0.005

              P(K₃) = 0.01 × 0.03 = 3×10⁻⁴

              P(T) ≈ 2×10⁻⁴ + 5×10⁻³ + 3×10⁻⁴

                    = 5.5×10⁻³

K₂ (single point {C}) dominates — highest priority for design improvement.

Component Importance Measures

Measure	Formula	Interpretation	Use when
Birnbaum (Structural)	I^B(i) = ∂P(T)/∂qᵢ	Rate of change of top event probability with respect to component i's failure probability	Comparing sensitivity — which component improvement gives biggest P(T) reduction?
Fussell-Vesely	I^FV(i) = P(at least one MCS containing i fails) / P(T)	Fraction of total risk contributed by cut sets containing component i	Maintenance prioritisation — where does this component contribute most to risk?
Risk Reduction Worth (RRW)	RRW(i) = P(T) / P(T \| qᵢ=0)	Factor by which P(T) decreases if component i is made perfect (qᵢ→0)	Investment decisions — what is the maximum achievable benefit of improving component i?
Risk Achievement Worth (RAW)	RAW(i) = P(T \| qᵢ=1) / P(T)	Factor by which P(T) increases if component i is guaranteed to fail	Maintenance criticality — how important is it to keep this component working?

FTA vs FMEA — Complementary, Not Competing

Fault Tree Analysis

▸ Top-down (deductive)
▸ Starts from a specific failure
▸ Finds all combinations that cause it
▸ Handles complex logic & dependencies
▸ Quantitative probability output
▸ Best for: safety-critical top events

FMEA / FMECA

▸ Bottom-up (inductive)
▸ Starts from each component
▸ Traces all effects of each failure
▸ Covers the full system comprehensively
▸ RPN prioritisation (qualitative)
▸ Best for: comprehensive coverage of all failure modes

Best practice: use FMEA first for broad coverage, then FTA for deep analysis of the highest-severity failure modes identified by FMEA. Together they give both breadth and depth.

Reliability Block Diagrams — System Architecture & Redundancy

A Reliability Block Diagram (RBD) is a success-oriented model that shows how components must function for the system to function. Unlike FTA which models failure, RBD models success paths. Based on MIT 22.38 Section IX (Simple Logical Configurations) and Rausand & Høyland Chapter 4.

Academic foundation: MIT 22.38 Section IX — Complex Systems, Stress-Strength Interference, Markov Models · Rausand & Høyland, System Reliability Theory 2nd Ed. Ch. 4 · IEC 61078:2016 — Reliability block diagram techniques

Series, Parallel, and k-out-of-n Systems

Series System — Chain of Single Points

All components must work

System fails if any single component fails. Reliability is always lower than the weakest component. The engineering challenge: every component is a single point of failure.

Rs = R₁ × R₂ × R₃ × … × Rₙ

For n equal components R each:

Rs = Rⁿ   → rapidly decreasing

Worked Example — 4 pumps in series, R = 0.95 each:

Rs = 0.95⁴ = 0.8145  (18.6% chance of failure)

Active Parallel — Full Redundancy

Any one component is sufficient

System fails only if all parallel components fail. Reliability always exceeds the best single component. Each component runs continuously (hot standby).

Rs = 1 − ∏ᵢ(1 − Rᵢ) = 1 − (1−R)ⁿ

Worked Example — 3 pumps in parallel, R = 0.90 each:

Rs = 1 − (1−0.90)³ = 1 − 0.001 = 0.999

k-out-of-n Systems — Voting Architectures

A k-out-of-n system succeeds if at least k of n components function. This generalises both series (k=n) and parallel (k=1). Common in safety systems: 2-out-of-3 voting gives high reliability without the cost of full parallel redundancy.

General Formula (equal components, R each)

Rk/n = Σⱼ₌ₖⁿ C(n,j) × Rʲ × (1−R)ⁿ⁻ʲ

where C(n,j) = n! / [j!(n−j)!] is the binomial coefficient. This is the binomial reliability sum from k to n.

Worked Example — 2-out-of-3 voting, R = 0.90

            P(≥2 work) = C(3,2)×0.9²×0.1¹ + C(3,3)×0.9³×0.1⁰

            = 3×0.81×0.1 + 1×0.729

            = 0.243 + 0.729 = 0.972

Compare: pure parallel 1-out-of-3 = 0.999 (higher), series 3-out-of-3 = 0.729 (lower). 2-out-of-3 is the safety engineer's sweet spot.

Standby Redundancy — Cold, Warm, and Hot

Type	Standby State	Switch Reliability	Reliability Formula	Application
Hot Standby	Fully powered, running at full load — instant takeover	Near 1.0 (automatic)	Same as active parallel: R = 1−(1−R)ⁿ	Aircraft hydraulics, nuclear safety systems
Warm Standby	Partially energised — reduced failure rate λ_s < λ during standby	High, with brief startup	Requires Markov model — intermediate between hot/cold	Generator sets, server farms
Cold Standby	De-energised — zero failure rate during standby	R_sw required (switch may fail)	R_s = e^−λt(1 + λt) for 1-unit standby with perfect switch	Backup pumps, emergency systems

Cold Standby — Derivation (1 active + 1 standby, perfect switch)

          P(system works at t) = P(active works) + P(active fails before t, standby takes over and works)

          = e−λt + ∫₀ᵗ λe−λτ · e−λ(t−τ) dτ

          = e−λt + λt·e−λt

          = e−λt(1 + λt)

Significantly higher than active parallel [1−(1−e^−λt)²] because the standby unit does not accumulate ageing during standby period.

Stress-Strength Interference — MIT 22.38 Section IX.3

A component fails when applied stress S exceeds its strength R. Both are random variables. Reliability = P(R > S). This is the probabilistic basis for design margins.

General Formula

            R = P(Strength > Stress)

            R = ∫₋∞^∞ f_S(s) · F_R(s) ds

            = ∫₋∞^∞ f_S(s) · P(R > s) ds

Normal-Normal Case (analytical result)

            If S ~ N(µ_S, σ_S²) and R ~ N(µ_R, σ_R²):

            (R−S) ~ N(µ_R−µ_S, σ_R²+σ_S²)

            Reliability = Φ[(µ_R−µ_S) / √(σ_R²+σ_S²)]

            = Φ[z_margin]

This z_margin is the "reliability index" β used in structural reliability and ISO 2394.

Accelerated Life Testing — Compressing Time to Failure

ALT subjects products to stresses (temperature, voltage, vibration, humidity) higher than normal use conditions to induce failures faster, then models the stress-life relationship to extrapolate reliability at use conditions. The core challenge: accelerate only the same failure mechanisms that would occur in service.

References: Elsayed, Reliability Engineering (Addison-Wesley, 1996) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · MIL-HDBK-217F Reliability Prediction · IEC 60068 Environmental Testing Standards · University of Maryland ENRE 641 — Accelerated Life Testing course

The Three Primary Life-Stress Models

Model 1 — Arrhenius (Temperature)

Most widely used ALT model

Derived from the Arrhenius equation for chemical reaction rates. Valid when the dominant failure mechanism is thermally activated — oxidation, corrosion, electromigration, diffusion, creep.

Life-Temperature Relationship

L(T) = A · exp(E_a / kT)

L = characteristic life (B50, MTTF, η), A = pre-exponential constant, E_a = activation energy (eV), k = Boltzmann constant = 8.617×10⁻⁵ eV/K, T = temperature in Kelvin

Acceleration Factor

AF = L_use / L_test = exp[E_a/k × (1/T_use − 1/T_test)]

AF tells you how many use-hours one test-hour represents

Worked Example — Semiconductor Oxide Degradation

                  E_a = 0.7 eV (oxide degradation)

                  T_use = 55°C = 328 K

                  T_test = 125°C = 398 K

                  AF = exp[0.7/8.617×10⁻⁵ × (1/328 − 1/398)]

                  = exp[8123 × (0.003049 − 0.002513)]

                  = exp[8123 × 0.000536]

                  = exp[4.354]

                  = 77.8×

Interpretation: 1,000 hours at 125°C = 77,800 hours (≈8.9 years) at 55°C use temperature — with E_a = 0.7 eV

Typical E_a values:
0.3–0.5 eV: Electromigration in Al
0.5–0.7 eV: Oxide breakdown
0.7–1.0 eV: Corrosion mechanisms
1.0–1.4 eV: Si-SiO₂ interface traps

Model 2 — Inverse Power Law (Voltage / Stress)

For non-thermal stress: voltage, load, pressure

Used when failure mechanism is driven by mechanical stress, voltage, or other non-thermal accelerants. L(S) follows a power law relationship with the stress level S.

Life-Stress Relationship

L(S) = C / Sⁿ

C = constant, S = stress level (V, MPa, Hz), n = inverse power law exponent (fitted from data)

AF = (S_test / S_use)ⁿ

Typical n: 2–4 for dielectric breakdown, 3–6 for capacitor voltage stress

Worked Example — Capacitor Voltage Stress

                  Rated voltage: V_use = 50V

                  Test voltage: V_test = 100V

                  Power law exponent: n = 4

                  AF = (100/50)⁴ = 2⁴ = 16×

Interpretation: Testing at 2× rated voltage compresses time by 16× for a power law exponent of 4

Model 3 — Eyring (Temperature + Second Stress)

Extends Arrhenius to include a second stress variable (humidity, voltage, vibration). Derived from quantum mechanics (reaction rate theory). Used in humidity + temperature testing (85°C/85% RH, JEDEC JESD22-A101).

Generalised Eyring Model

L(T,V) = (A/T) · exp(E_a/kT) · exp(−(B + C/T)·V)

T = temperature (K), V = second stress, A, B, C = model parameters fitted from multi-stress test data

Common Multi-Stress ALT Test Conditions

Test	Stress 1	Stress 2	Standard
HAST	130°C	85% RH	JESD22-A110
85/85	85°C	85% RH	JESD22-A101
THB	85°C	85% RH + bias	AEC-Q100
HTOL	125–150°C	Full voltage	JESD22-A108

HALT, HASS, and ESS — Qualitative vs Quantitative ALT

HALT

Highly Accelerated Life Test

Apply stepwise increasing stress (temperature, vibration, both combined) to failure. Goal: find the operating limit and destruct limit. Qualitative — not intended for life prediction, but for design robustness discovery.

Used at: design validation phase. Output: design margins, failure modes to address before production.

HASS

Highly Accelerated Stress Screening

Production screen applied to every unit (or sample). Uses stress levels below HALT destruct limits to precipitate latent defects before shipment without consuming life of good units.

Used at: production. Output: defect escape rate reduction, infant mortality elimination.

ESS

Environmental Stress Screening

Temperature cycling and/or random vibration screen applied post-assembly. MIL-HDBK-2164 defines profiles. Addresses infant mortality phase of bathtub curve — forces early failures to occur in factory, not in the field.

Typical profile: −40°C to +70°C, 5 cycles, 3–5 G_rms vibration. Governed by MIL-HDBK-2164A.

Combining Weibull with ALT — Life Data Analysis

In ALT data analysis, Weibull distribution is fitted at each stress level. The assumption is that the shape parameter β is constant across stress levels (same failure mechanism), while the scale parameter η changes with stress according to the life-stress model.

Arrhenius-Weibull Model

              η(T) = A · exp(E_a / kT)

              R(t,T) = exp[−(t/η(T))^β]

              β = constant (same failure mechanism)

Parameters estimated via Maximum Likelihood Estimation (MLE) from pooled data across all stress levels.

ALT Data Analysis Workflow

01 Run tests at ≥3 stress levels above use stress

02 Fit Weibull to each stress level — verify β is consistent

03 Plot ln(η) vs 1/T — confirm linearity (Arrhenius)

04 Estimate E_a from slope of ln(η) vs 1/T line

05 Extrapolate η to use stress using life-stress model

06 Compute R(t) and B10 at use conditions with confidence bounds

Reliability Demonstration Testing — Proving What You Claim

A reliability demonstration test (RDT) answers a specific question: "Can I claim with C% confidence that the true reliability is at least R* at time t?" It requires defining a reliability target, a confidence level, a mission time, and a test termination criterion — before running a single unit.

References: Meeker & Escobar, Statistical Methods for Reliability Data Ch. 10 (Wiley, 1998) · MIL-HDBK-781 — Reliability Testing for Engineering Development, Qualification, and Production · IEC 61124 — Reliability Testing: Compliance Tests for Constant Failure Rate and Constant Failure Intensity

The Mathematics of Demonstration Testing

The fundamental statistical basis: if a sample of n units is tested and c failures are observed, the lower confidence bound on the true failure probability p at a given confidence level C is derived from the binomial distribution (or Poisson for time-terminated tests).

Zero-Failure Test (c = 0) — Success Run

              Claim: R* at confidence C

              Required sample: n = ln(1−C) / ln(R*)

              Or: n = ln(α) / ln(R*)  where α = 1−C

If all n units pass (zero failures), you can claim reliability ≥ R* at confidence C. The most efficient test when you expect very high reliability.

With c Failures Allowed (Binomial basis)

              1 − C = Σⱼ₌₀ᶜ C(n,j) · (1−R*)ʲ · (R*)ⁿ⁻ʲ

              Solve for n given C, R*, and allowed failures c

Allowing failures increases n required but reduces the risk of falsely rejecting a good product.

Sample Size Required — Zero Failure Test

Reliability R*	90% Confidence	95% Confidence	99% Confidence
0.900	22	29	44
0.950	45	59	90
0.990	230	299	459
0.999	2302	2995	4603
0.9999	23026	29957	46051

Formula: n = ln(1−C) / ln(R*). Demonstrating very high reliability requires enormous samples — the practical argument for ALT.

Time-Terminated Tests — Poisson Basis

When units are tested for a fixed time T (each), total accumulated test time = n × T. For an exponential (constant failure rate) model, the number of failures follows a Poisson distribution. This allows MTBF/failure rate demonstration.

Lower Confidence Bound on MTBF

              MTBF_lower = 2T_total / χ²(α, 2c+2)

              T_total = total accumulated test time

              c = observed failures

              α = 1 − C (risk level)

χ² quantile from chi-squared distribution with 2c+2 degrees of freedom

Zero-Failure Time Test (c=0)

MTBF_lower = 2T_total / χ²(α, 2) = −2T_total / ln(α)

χ²(α,2) = −2 ln(α) for 2 degrees of freedom

Worked Example — MTBF Demonstration

              Requirement: MTBF ≥ 5,000 hr

              Confidence required: 90% (α = 0.10)

              Test plan: 10 units × 1,000 hr each

              T_total = 10,000 hr

              Result: 0 failures observed

              MTBF_lower = −2×10,000 / ln(0.10)

              = −20,000 / (−2.303)

              = 8,686 hr

✓ Claim: MTBF ≥ 5,000 hr at 90% confidence demonstrated

The lower bound (8,686 hr) exceeds the 5,000 hr requirement

Producer & Consumer Risk — The OC Curve for Reliability

Consumer Risk (β)

Probability that a product with reliability below the requirement passes the test. A false accept.

β = P(accept | R < R*)

Typically ≤ 0.10 for safety-critical systems. Reducing β requires larger n or fewer allowed failures.

Producer Risk (α)

Probability that a product with reliability above the requirement fails the test. A false reject.

α = P(reject | R ≥ R*)

Allowed c > 0 reduces producer risk. The discrimination ratio d = R_acceptable / R_rejectable controls the sharpness of the OC curve.

Discrimination Ratio d

Ratio between the MTBF that should be accepted (θ₁) and the MTBF that should be rejected (θ₀).

d = θ₁ / θ₀ ≥ 1

Larger d → easier to discriminate → smaller test required. MIL-HDBK-781 defines standard test plans for d = 1.5, 2.0, 3.0.

Sources for this module: MIT OCW 22.38 (Prof. M. Golay) — Probability and Its Applications to Reliability, Quality Control, and Risk Assessment · Rausand & Høyland, System Reliability Theory 2nd Ed. (Wiley, 2003) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · Elsayed, Reliability Engineering (Addison-Wesley, 1996) · MIL-HDBK-217F, MIL-HDBK-781, MIL-STD-1629A · IEC 60300, IEC 61078, IEC 61124 · University of Maryland ENRE 641

Distribution Functions — Complete Reliability Toolkit

NIST 8.1.7–8.1.9 covers the full family of distributions used in reliability engineering. Each distribution is defined by its hazard function shape — choosing the right one is not a statistical preference but a physical claim about the failure mechanism.

NIST reference: Engineering Statistics Handbook Sections 8.1.7 (Exponential), 8.1.8 (Weibull), 8.1.9 (Lognormal, Normal, Gamma) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · Nelson, Accelerated Testing (Wiley, 1990)

Exponential Distribution — NIST 8.1.7

The exponential is the only continuous distribution with the memoryless property: P(T > t+s | T > t) = P(T > s). A component that has survived to time t has the same remaining life distribution as a new component. This applies only during the useful-life phase (constant failure rate).

Complete Formula Set — Exponential(λ)

f(t) = λ·e^(−λt),   t ≥ 0
F(t) = 1 − e^(−λt)
R(t) = e^(−λt)
h(t) = λ  (constant)
H(t) = λt
MTTF = 1/λ
Var(T) = 1/λ²
Median = ln(2)/λ = 0.693/λ

Worked Example — Electronic Component Reliability

Component failure rate: λ = 2×10⁻⁵ failures/hr
MTBF = 1/λ = 50,000 hr

R(t=8760 hr) = e^(−2×10⁻⁵ × 8760)
= e^(−0.1752) = 83.9% (1-year reliability)

R(t=40000) = e^(−0.8) = 44.9%
P(fail before 40,000 hr) = 55.1%

⚠️

Common misconception: MTBF = 50,000 hr does NOT mean the component lasts 50,000 hr. It means ~63.2% fail BEFORE 50,000 hr. At t = MTBF, R(MTBF) = e⁻¹ = 36.8% survive.

Lognormal Distribution — NIST 8.1.9

If ln(T) ~ Normal(µ, σ²), then T ~ Lognormal(µ, σ). Best for failure mechanisms driven by multiplicative damage accumulation: fatigue, corrosion, crack propagation. The hazard function is unimodal — rises then decreases (IFR then DFR), making it physically realistic for degradation processes.

Complete Formula Set — Lognormal(µ, σ)

f(t) = φ[(ln t−µ)/σ] / (σt)
F(t) = Φ[(ln t−µ)/σ]
R(t) = 1 − Φ[(ln t−µ)/σ]
h(t) = f(t)/R(t)  [no closed form]
MTTF = exp(µ + σ²/2)
Median = e^µ
Var(T) = e^(2µ+σ²)·(e^(σ²)−1)

φ = standard normal PDF, Φ = standard normal CDF

Worked Example — Fatigue Life of Steel Shaft

µ = 10.5, σ = 0.8 (in ln-hours). Find R at 30,000 hr and MTTF.

z = (ln(30000) − 10.5) / 0.8
= (10.309 − 10.5) / 0.8 = −0.239
R(30000) = 1 − Φ(−0.239) = 59.4%

MTTF = exp(10.5 + 0.8²/2) = exp(10.82) = 49,916 hr

Applications: Fatigue, corrosion, stress-corrosion cracking, electromigration, semiconductor oxide breakdown, biological failure times

Normal Distribution in Reliability

The Normal(µ, σ) is appropriate when failure times have a symmetric distribution — tight wear-out mechanisms where fatigue accumulates uniformly. The hazard function is strictly increasing (IFR), making it suitable for components that reliably wear out at a predictable age.

Complete Formula Set — Normal(µ, σ)

f(t) = φ[(t−µ)/σ] / σ
F(t) = Φ[(t−µ)/σ]
R(t) = 1 − Φ[(t−µ)/σ]
h(t) = φ(z) / [σ(1−Φ(z))]  strictly IFR
MTTF = µ
B10 = µ − 1.282σ  (10th percentile)

Worked Example — Brake Pad Wear-Out

µ = 60,000 km, σ = 8,000 km. Find B10 and R at 45,000 km.

B10 = 60,000 − 1.282×8,000 = 49,744 km
R(45,000) = 1 − Φ[(45000−60000)/8000]
= 1 − Φ(−1.875) = 97.0%

Applications: Mechanical wear-out (pistons, gears, brake pads), light bulb filament life, highly-controlled manufacturing processes

Gamma Distribution — NIST 8.1.9

Gamma(k, β) is the distribution of the sum of k independent exponential(1/β) random variables. Shape parameter k controls hazard function shape: k < 1 gives DFR, k = 1 gives exponential, k > 1 gives IFR.

Complete Formula Set — Gamma(k, β)

f(t) = t^(k−1)·e^(−t/β) / [β^k·Γ(k)]
F(t) = I(t/β, k)  [incomplete gamma ratio]
R(t) = 1 − I(t/β, k)
MTTF = kβ
Var(T) = kβ²
Mode = (k−1)β for k ≥ 1

Distribution Selection Guide — NIST

Failure Mechanism	Best Distribution
Constant random failures	Exponential
Infant mortality / any phase	Weibull
Fatigue, corrosion, crack growth	Lognormal
Symmetric, tight wear-out	Normal
Sum of k failure events	Gamma
Unknown — fit all, use AIC/BIC	Probability plot comparison

Parameter Estimation — MLE, Rank Regression & Censored Data

Fitting a reliability distribution to field or test data is a statistical inference problem. Two main methods: Maximum Likelihood Estimation (MLE) — the NIST-preferred method for accuracy and confidence interval generation — and Rank Regression — graphical, intuitive, and useful for small samples. Both must handle censored data correctly.

NIST reference: Engineering Statistics Handbook Sections 8.2.1 (Kaplan-Meier), 8.2.2 (Probability Plotting), 8.2.4 (Confidence Intervals), 8.2.6 (MLE) · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) Chapters 3–5

Censoring — The Core Challenge of Reliability Data

Complete (Exact) Failure

The exact failure time tᵢ is known. Contributes f(tᵢ) to the likelihood. The ideal case — often impractical in life testing.

L contribution: f(tᵢ)

Right Censored (Suspended)

Unit survived to time cᵢ (end of test or withdrawal). We know T > cᵢ but not exact failure time. Most common type.

L contribution: R(cᵢ) = 1−F(cᵢ)

Left Censored

Unit already failed before first inspection at time dᵢ. We know T < dᵢ. Common in inspection data.

L contribution: F(dᵢ)

Interval Censored

Failure in interval [Lᵢ, Rᵢ] — inspected OK at Lᵢ, failed at Rᵢ. Very common in periodic inspection.

L contribution: F(Rᵢ) − F(Lᵢ)

Maximum Likelihood Estimation (MLE) — NIST 8.2.6

MLE finds the parameter values that make the observed data most probable. For mixed censored data with r failures and (n−r) censored units:

Full Likelihood — Mixed Censored Data

L(θ) = C · ∏ᵢ∈failures f(tᵢ; θ) · ∏ⱼ∈censored R(cⱼ; θ)

Log-likelihood:  ℓ(θ) = Σᵢ ln f(tᵢ) + Σⱼ ln R(cⱼ)

Maximise ℓ(θ) by solving: ∂ℓ/∂θ = 0  (numerically)

MLE for Weibull — Score Equations

∂ℓ/∂β: r/β + Σ ln(tᵢ) − (1/ηᵝ)Σ tᵢᵝ ln(tᵢ) = 0
∂ℓ/∂η: −rβ/η + (β/η^(β+1))Σ tᵢᵝ = 0
→ Solve numerically (Newton-Raphson or EM algorithm)

MLE Advantages (NIST-preferred)

Asymptotically unbiased and efficient
Handles all censoring types correctly
Provides Fisher information for confidence intervals
Can be used with covariates (regression models)
Standard in Minitab, ReliaSoft Weibull++

MLE Confidence Intervals — Fisher Matrix Method

Var(θ̂) ≈ [−∂²ℓ/∂θ²]⁻¹  (Fisher information)

95% CI on R(t): use log-log transform
θ = ln(−ln R̂(t))
Var(θ) ≈ [Σ dᵢ/(nᵢ(nᵢ−dᵢ))] / [ln R̂(t)]²
CI: R̂(t)^exp(±1.96√Var(θ))

Kaplan-Meier Estimator — Non-Parametric Survival (NIST 8.2.1)

The Kaplan-Meier estimator computes the empirical survival function without assuming any parametric form. Essential for exploratory analysis. Correctly handles right-censored data (suspended items).

KM Formula

R̂(t) = ∏ᵢ: tᵢ≤t (1 − dᵢ/nᵢ)

where:
tᵢ = ordered failure times
dᵢ = deaths (failures) at tᵢ
nᵢ = units at risk just before tᵢ
(includes censored units still alive)

KM is the NPMLE (non-parametric MLE) of R(t) — statistically optimal, not just heuristic. Greenwood's formula gives the variance: Var[R̂(t)] ≈ [R̂(t)]² · Σ dᵢ/[nᵢ(nᵢ−dᵢ)]

KM Example — 8 Units, 2 Censored

Events: 500, 800†, 1100, 1400, 1800†, 2200, 2700, 3200
(† = censored/suspended)

t=500:  n=8, d=1  R̂ = 1·(7/8) = 0.875
t=1100: n=6, d=1  R̂ = 0.875·(5/6) = 0.729
t=1400: n=5, d=1  R̂ = 0.729·(4/5) = 0.583
t=2200: n=3, d=1  R̂ = 0.583·(2/3) = 0.389
t=2700: n=2, d=1  R̂ = 0.389·(1/2) = 0.194
t=3200: n=1, d=1  R̂ = 0.194·(0/1) = 0.000

Censored units at 800 and 1800 hr drop from the risk set but are accounted for via reduced nᵢ at subsequent events.

Competing Failure Modes & Stress-Strength Interference

Real systems fail from multiple distinct mechanisms — corrosion, fatigue, overload — acting simultaneously in competition. A single Weibull distribution fitted to mixed data gives misleading results. Understanding competing failure modes and probabilistic stress-strength interaction is essential for design and maintenance decisions.

Reference: NIST 8.1.10 — Competing Failure Modes · Meeker & Escobar Ch. 15 · Nelson, Accelerated Testing Ch. 11 (Wiley, 1990) · MIT 22.38 Section IX.3 — Stress-Strength Interference

Competing Failure Modes — The Series System of Mechanisms

If a unit can fail by any of k independent modes, the system survives only if all modes survive. This is a series reliability model on the failure mechanisms:

Competing Failure Modes — Key Equations

T = min(T₁, T₂, …, Tₖ)  where Tᵢ = time to failure by mode i

R_sys(t) = R₁(t) · R₂(t) · … · Rₖ(t)  (if modes are independent)
F_sys(t) = 1 − ∏ᵢ [1 − Fᵢ(t)]
h_sys(t) = h₁(t) + h₂(t) + … + hₖ(t)  ←  hazard functions ADD

For exponential modes: λ_sys = λ₁ + λ₂ + … + λₖ

Mixed Weibull Populations — Bimodal Failure Data

2-Component Mixture Weibull

F(t) = p·F₁(t) + (1−p)·F₂(t)
f(t) = p·f₁(t) + (1−p)·f₂(t)
R(t) = p·R₁(t) + (1−p)·R₂(t)

where p = fraction from subpopulation 1
F₁(t) = Weibull(β₁, η₁)  [infant mortality]
F₂(t) = Weibull(β₂, η₂)  [wear-out]

Note: Mixture R(t) ≠ product of component R(t). This is a mixture of populations, not a series system.

Worked Example — Electronic Assembly

10% of assemblies have a solder defect (β₁=0.6, η₁=200 hr), 90% are good (β₂=3.5, η₂=12,000 hr).

p = 0.10,  (1−p) = 0.90

At t = 100 hr:
F₁(100) = 1−exp[−(100/200)^0.6] = 0.325
F₂(100) ≈ 0.000
F_mix(100) = 0.10×0.325 + 0.90×0 = 3.25%

At t = 8000 hr:
F₁(8000) ≈ 0.997
F₂(8000) = 1−exp[−(8000/12000)^3.5] = 0.116
F_mix(8000) = 0.10×0.997 + 0.90×0.116 = 20.4%

Stress-Strength Interference Model — NIST 8.1.11

General Formula

R = P(Strength > Stress) = P(R > S)

R = ∫₋∞^∞ f_S(s) · [1 − F_R(s)] ds
= ∫₋∞^∞ f_S(s) · P(R > s) ds

Normal-Normal Analytical Solution

S ~ N(µ_S, σ_S²),  R ~ N(µ_R, σ_R²)
(R−S) ~ N(µ_R−µ_S, σ_R²+σ_S²)

Reliability = Φ[z]
z = (µ_R − µ_S) / √(σ_R² + σ_S²)
z = "reliability index" β

Worked Example — Shaft Design

R ~ N(µ_R=500 MPa, σ_R=40 MPa)
S ~ N(µ_S=350 MPa, σ_S=30 MPa)

z = (500 − 350) / √(40² + 30²)
= 150 / √2500 = 150/50 = 3.0

Reliability = Φ(3.0) = 99.865%
P(failure) = 1,350 ppm

Safety factor = µ_R/µ_S = 500/350 = 1.43
→ Safety factor 1.43 → 1,350 ppm failure
→ z = 3.0 is the real risk metric

📌

Safety Factor vs z: A high deterministic safety factor with high variability may give worse reliability than a lower safety factor with tight distributions. The reliability index z accounts for both mean margins AND variability — it is the true engineering measure of safety.

Maintainability & Availability — Repairable Systems

Most real-world systems are repairable. Reliability alone is insufficient; engineering must also quantify maintainability (ease and speed of repair) and availability (net fraction of time the system is operational). This section covers NIST 8.4 and renewal theory fundamentals.

Reference: NIST Engineering Statistics Handbook Section 8.4 · Rausand & Høyland, System Reliability Theory 2nd Ed. Ch. 10 · IEC 60300-3-5 · MIL-HDBK-470A Designing and Developing Maintainable Products and Systems

Three Levels of Availability

Inherent Availability A_i

Design-level — ideal conditions

Considers only corrective maintenance. Ignores PM time, logistics, supply delays. The theoretical maximum.

A_i = MTBF / (MTBF + MTTR)

Achieved Availability A_a

Operations — CM + PM included

Includes corrective and preventive maintenance downtime. Does not include logistics/administrative delays.

A_a = MTBM / (MTBM + M̄)

MTBM = Mean Time Between Maintenance (all types), M̄ = mean active maintenance time

Operational Availability A_o

Real-world — all delays included

Includes logistics delay time (LDT) and administrative delay time (ADT). The real-world user experience.

A_o = Uptime / (Uptime + Downtime)

Always: A_o ≤ A_a ≤ A_i

Steady-State Availability — Markov Model Derivation

Two-State Markov Model — Exact Derivation

Transitions:
UP → DOWN at rate λ (failure)
DOWN → UP at rate µ = 1/MTTR

A(t) = µ/(λ+µ) + [λ/(λ+µ)]·e^(−(λ+µ)t)

Steady-state (t → ∞):
A(∞) = µ/(λ+µ) = MTBF/(MTBF+MTTR)

Worked Example

MTBF = 1,000 hr,  MTTR = 4 hr
A(∞) = 1000/1004 = 99.60%

MTBF → 2,000 hr (2× reliability improvement):
A = 2000/2004 = 99.80% (+0.20%)

MTTR → 2 hr (2× maintainability improvement):
A = 1000/1002 = 99.80% (+0.20%)

→ Equal gain! Compare investment costs.

Optimal PM Interval — Cost Minimisation

Cost-Based Optimal PM Interval

C_P = planned PM cost, C_F = corrective failure cost. Optimise long-run cost rate:

C(t) = [C_P + C_F·F(t)] / [t·R(t) + M(t)]

M(t) = ∫₀ᵗ R(u)du  [expected life to t]

Solve dC(t)/dt = 0 → find t*

For Weibull β > 1 (IFR): numerical solution gives t* ≈ 0.3–0.7 × η. PM is ineffective if β ≤ 1 (CFR or DFR) — run-to-failure is optimal.

Worked Example — Pump Seal

β=2.5, η=3,000 hr. C_P=£500 (planned), C_F=£8,000 (failure + downtime cost).

t* = 1,800 hr (found numerically)
F(1800) = 1−exp[−(1800/3000)^2.5] = 0.231

Cost rate with PM ≈ £1.30/hr
Run-to-failure: £8,000/MTTF ≈ £3.00/hr
→ PM saves ~57% of maintenance cost rate

Renewal Theory — HPP vs NHPP (NIST 8.3)

Repairable systems restored to "as good as new" follow a Homogeneous Poisson Process (HPP). Partially-repaired systems follow a Non-Homogeneous Poisson Process (NHPP) with time-dependent intensity ρ(t).

HPP — "As Good As New" Repairs

Inter-failure times: iid Exponential(λ)
E[N(t)] = λt
Var[N(t)] = λt
Test: cumulative failures vs t is linear

NHPP — Crow-AMSAA Power Law

ρ(t) = λβt^(β−1)
E[N(t)] = λt^β
β < 1: improving (reliability growth)
β = 1: HPP (constant)
β > 1: worsening (reliability decay)
MLE: β̂ = n / Σᵢ ln(T/tᵢ)

Reliability Growth — AMSAA Crow Example

System tested for T=2,000 hr. 12 failures.

β̂ = 12 / [Σᵢ ln(2000/tᵢ)]
= 12 / 21.4 ≈ 0.560

β̂ = 0.56 < 1 → reliability is growing

Projected failures at T=4,000 hr:
λ̂ = n/T^β = 12/2000^0.56 ≈ 0.265
E[N(4000)] = 0.265×4000^0.56 ≈ 18.2

📌

MIL-HDBK-189C: Plot cumulative failures vs ln(t) on log-log paper. A straight line confirms the Power Law NHPP. Slope = β. Standard for reliability growth tracking during development testing.

Sources for tabs 10–13: NIST Engineering Statistics Handbook Sections 8.1.7–8.1.11, 8.2.1, 8.2.4, 8.2.6, 8.3, 8.4 · Meeker & Escobar, Statistical Methods for Reliability Data (Wiley, 1998) · Nelson, Accelerated Testing (Wiley, 1990) · Rausand & Høyland, System Reliability Theory 2nd Ed. (Wiley, 2003) · MIL-HDBK-189C — Reliability Growth Management · MIL-HDBK-470A — Maintainability Design · IEC 60300 Series

Statistical Distributions Destination

Statistical Distributions

A distribution is not just a formula. It is a model of how data behaves: where values cluster, how tails behave, what kinds of outcomes are possible, and what assumptions your downstream analysis is making.

This page is designed as a world-class reference and teaching system: an 8-distribution visual studio, a 30-family continuous catalog, a 9-family discrete catalog, and a selector guide that tells users which distribution to choose and under what conditions.

Essential Distributions Studio — Visual, Formula-Driven, Example-Led

NIST/SEMATECH emphasizes that distribution choice should be supported by graphics and goodness-of-fit checks, including probability plots for competing families. This studio front-loads the distributions engineers use most often and connects each one to a graph, formula, parameter meaning, and actual engineering use case.

Choose the distribution family you want to understand

Start with the family that matches your data type and mechanism. Then validate the choice with plots and process knowledge.

Normal distribution

The normal distribution is the default model for many physical measurements when variation comes from many small additive sources. It is symmetric, bell-shaped, and fully determined by μ and σ.

ContinuousSymmetricMean = Median = ModeFoundation of Cp/Cpk

Real engineering example

Coating thickness across a stable roll-to-roll process often looks approximately normal when the process is centered and major disturbances are absent. That is why capability analysis and Z-based defect estimates often start here.

f(x) = 1 / (σ√(2π)) · exp[-(x−μ)² / (2σ²)]

Bell shape means most values cluster around the center

About 68% lies within ±1σ, 95% within ±2σ, and 99.73% within ±3σ if the process truly follows a normal model.

Support

−∞ to +∞

Center

Spread

Use

Measurements

Condition for use

Use it for continuous measurements when the histogram is approximately symmetric, the tails are not wildly heavy, and the normal probability plot is reasonably straight.

Lognormal distribution

A variable is lognormal when its logarithm is normally distributed. Values are strictly positive and the distribution is right-skewed, often with a long tail.

ContinuousPositive onlyRight-skewedMultiplicative effects

Real engineering example

Cycle times, repair times, particle sizes, and supplier lead times often show lognormal behavior because many multiplicative factors stretch the upper tail.

f(x) = 1 / (xσ√(2π)) · exp[-(ln x − μ)² / (2σ²)], x > 0

Right tail risk matters

A lognormal process can have a perfectly reasonable median while still producing occasional very large values in the upper tail.

Condition for use

Use it when values cannot be negative and the upper tail stretches farther than the lower side; especially when multiplicative factors drive the data.

Weibull distribution

Weibull is the workhorse of life-data analysis because its shape parameter β changes the hazard behavior. That makes it useful for infant mortality, random failure, and wear-out.

ReliabilityFlexible hazardLife dataB10/B50/Bx

Real engineering example

Cycles-to-failure of tabs, seal fatigue life, or motor bearing failure times are often modeled with Weibull because the failure pattern changes across the life cycle.

f(x) = (β/η) (x/η)^(β−1) exp[-(x/η)^β], x > 0

β changes the story

β<1 suggests infant mortality, β≈1 behaves like exponential random failure, and β>1 indicates wear-out.

Condition for use

Use it for life / failure data when the hazard is not obviously constant and you need a flexible reliability model tied to physics of failure.

Exponential distribution

The exponential distribution models waiting times when the event rate is constant. It is memoryless, so the future does not depend on how long you have already waited.

Constant hazardWaiting timesMTBF

Real engineering example

If rare unscheduled line stoppages occur independently at a roughly constant average rate, time-between-stoppages is often modeled exponentially.

f(x) = λ exp(−λx), x ≥ 0

Steep near zero, then decays

Short waits are more likely than long waits, but the hazard rate stays constant across time.

Condition for use

Use it for interarrival times and random-failure periods only when the hazard is approximately constant. If hazard changes with age, move to Weibull.

Binomial distribution

The binomial distribution models the number of successes or defectives in a fixed number of independent yes/no trials with the same probability p.

DiscretePass / failAcceptance sampling

Real engineering example

If you inspect 20 welds and each weld is either acceptable or defective, the number of defectives in the sample is binomial.

P(X=k) = C(n,k) p^k (1−p)^(n−k)

Probability mass over possible defect counts

Unlike continuous distributions, binomial places probability on whole numbers only: 0 defectives, 1 defective, 2 defectives, and so on.

Condition for use

Use it when you have a fixed number of independent trials, each trial has only two outcomes, and the probability of success/defect is constant.

Poisson distribution

The Poisson distribution models counts of rare events per unit area, time, volume, or opportunity when events occur independently at a constant average rate λ.

Discrete countsRare eventsc-chart / u-chart logic

Real engineering example

Pinholes per square meter, scratches per panel, voids per electrode sheet, or complaints per day often start with a Poisson model.

P(X=k) = e^(−λ) λ^k / k!

Count distribution with right skew at low λ

When λ is small, zero and low counts dominate. As λ increases, the distribution becomes more symmetric.

Condition for use

Use it for counts of events per fixed opportunity when events are independent and the average rate is reasonably stable.

Student's t distribution

The t distribution is used when estimating a mean from a small sample and the population standard deviation is unknown. It has heavier tails than the normal distribution.

InferenceSmall nUnknown σ

Real engineering example

Suppose you have only 8 peel-strength results from a pilot line and need a confidence interval for the mean. That interval is built with a t critical value, not a Z critical value.

T = (X̄ − μ) / (S / √n)

Heavier tails protect against small-sample uncertainty

Lower degrees of freedom produce heavier tails. As df grows, the t distribution approaches the normal distribution.

Condition for use

Use it when the sample is small and population sigma is unknown; it is a reference distribution for inference, not usually the raw data model itself.

Chi-square distribution

The chi-square distribution is built from sums of squared standard normal variables. It appears in variance confidence intervals, chi-square tests, and goodness-of-fit problems.

VarianceGOF testsAlways positive

Real engineering example

If you want a confidence interval for process variance, or you need a chi-square goodness-of-fit test for counts in categories, chi-square is the reference distribution.

χ² = Σ Z_i²

Right-skewed for low df, more spread for higher df

The distribution is always nonnegative because it is built from squared quantities.

Condition for use

Use it whenever squared deviations and sample variance are central to the question, such as variance intervals and goodness-of-fit tests.

How to choose rigorously: NIST recommends comparing competing distributions with graphics such as probability plots and checking whether the selected model is consistent with the process mechanism and the observed tail behavior.

Continuous Distribution Catalog — 30 Families in Selector Studio

This catalog uses the same click-to-learn approach as the Visual Studio. Select any continuous family to see the formula, symbol explanations, characteristics, use conditions, and a larger visual preview.

continuous · positive

Gamma

f(x)=x^(k−1)e^(−x/θ)/(Γ(k)θ^k), x>0

Condition for use

Positive right-skewed data such as waiting times or accumulated damage.

Symbols

k = shape, θ = scale, Γ(k) = gamma function that generalizes factorial.

Characteristics

Strictly positive, right-skewed, flexible body and tail. As k increases, the curve becomes less skewed and more bell-like.

Real example

Time to absorb moisture to a threshold, service duration, or rainfall-like waiting quantity.

Visual intuition first: use the shape to understand support, symmetry, skew, tail behavior, and whether the distribution is continuous or discrete.

Discrete Distribution Catalog — 9 Families in Selector Studio

Select any discrete family to view its formula, symbol meanings, characteristics, use conditions, and a larger visual preview.

discrete · 0/1

Bernoulli

P(X=1)=p, P(X=0)=1−p

Condition for use

Single pass/fail trial.

Symbols

p = success probability.

Characteristics

Only two outcomes are possible. It is the atomic building block for binomial-type models.

Real example

One weld acceptable or not; one part passes or fails.

Visual intuition first: use the shape to understand support, symmetry, skew, tail behavior, and whether the distribution is continuous or discrete.

Selector Guide — Which Distribution Should I Use?

Start with the data type, then the mechanism, then the shape. This is the practical decision flow quality engineers need.

Continuous measurement, symmetric histogram

Start with Normal. Confirm with a histogram and normal probability plot.

Continuous, positive only, strong right skew

Check Lognormal, Gamma, Weibull, or Log-logistic. Use process mechanism to decide.

Time-to-failure or cycles-to-failure

Start with Weibull. Use Exponential only if the hazard appears constant. Consider Lognormal when multiplicative degradation dominates.

Pass/fail counts in fixed sample size

Use Binomial. If sampling is without replacement from a finite lot, use Hypergeometric.

Defects per unit / event counts per time

Use Poisson for rare-event counts. If variance is much larger than the mean, consider Negative Binomial.

Need confidence interval for mean with small n

Use the t distribution for the inferential step, even if the underlying raw process data are approximately normal.

Need variance interval or goodness-of-fit test

Use Chi-square. For ANOVA or variance-ratio tests, use F.

Bounded proportion from 0 to 1

Use Beta or a transform-normal bounded family such as Johnson SB when shape flexibility is needed.

Best-practice workflow

1) Plot the data. 2) Use process knowledge to narrow the candidate families. 3) Compare competing fits with probability plots or fit statistics. 4) Choose the simplest defensible model that matches both the data and the mechanism.

Design of Experiments

Design of Experiments (DOE)

A practical guide to DOE — from foundational concepts through full factorial, fractional factorial, Taguchi, and mixture designs. Every concept is illustrated with real worked examples. Pioneered by Sir Ronald A. Fisher in the 1920s and extended by Taguchi, Box, Plackett & Burman — DOE remains the most powerful process optimisation tool available to quality engineers.

What is Design of Experiments?

DOE is the simultaneous study of several process variables. Rather than changing one factor at a time, you combine multiple factors in one study — drastically reducing the amount of testing required while gaining far deeper process understanding. It is primarily a logic tool, not an advanced mathematics tool.

The Process Model — Inputs, Process & Output

Why NOT One-Factor-At-A-Time (OFAT)?

❌ OFAT — One Factor at a Time

▸ Change Temperature → measure
▸ Change Pressure → measure
▸ Change Speed → measure
Cannot detect interactions between factors
Wastes runs. Misleading conclusions possible.

✓ DOE — Simultaneous Study

▸ All combinations tested together
▸ Same data used for multiple factors
▸ Detects interactions between factors
▸ Fewer total runs for same information
▸ Builds a predictive model of the process

The 9 Steps for Analysis of Effects

Every experiment in this module follows these nine analytical steps. Steps 3–6 are skipped for unreplicated experiments, attribute data, and Taguchi S/N ratio analyses — a half-normal plot is used instead.

Nine Steps — Universal DOE Analysis Framework

The 6 Objectives of DOE

📈 Maximize Response

Find settings that produce the highest output — e.g., maximum bond strength

📉 Minimize Response

Find settings that produce the lowest output — e.g., minimum defects or corrosion

🎯 Hit a Target

Adjust the process to achieve a nominal value — e.g., target wall thickness of 2.0mm

⬇️ Reduce Variation

Find settings that produce the most consistent output — lower σ, higher Cpk

🛡️ Make Process Robust

Make the response insensitive to uncontrollable noise factors — temperature drift, humidity, etc.

🔍 Identify Key Factors

Determine which variables are truly important (vital few vs. trivial many)

Key Concepts & Vocabulary

DOE has its own precise vocabulary. Understanding these terms is essential — both for exam questions and for reading DOE results correctly.

Term	Definition	Example
Factor	A controllable input variable (X) that may affect the response. Also called independent variable.	Temperature, Pressure, Vendor, Catalyst concentration
Level	The specific setting or value used for a factor in an experiment. Two-level designs use High (+) and Low (−).	Temperature: Low = 580°F, High = 600°F
Response	The output (Y) being measured and improved. Also called dependent variable.	Bond strength, Yield %, Weight loss, Hardness
Run / Treatment	A unique combination of factor levels. Each run may be performed more than once.	A+ B− = High Temp + Vendor Y
Replication	An independent repeat of a run that includes a completely new setup. Provides estimate of inherent variation.	Running A+B+ three times from scratch
Repeat	Repetition of a run WITHOUT a new setup. Not the same as replication — does not estimate experimental error independently.	Running the same conditions back-to-back without reset
Full Factorial (2k)	All possible combinations of factor levels. 2 factors × 2 levels = 4 runs (22). 3 factors = 8 runs (23).	22 design: 4 unique treatments
Main Effect	The average change in response when moving a factor from its low to its high level, averaged across all levels of other factors.	E(A) = Ȳ(A+) − Ȳ(A−) = +2.05 units
Interaction	When the effect of one factor depends on the level of another factor. If interactions are significant, the interaction plot is more meaningful than the main effect plots.	Temperature effect is +5.1 with Vendor X but −1.0 with Vendor Y
Confounding / Alias	When two effects are indistinguishable from each other because they produce identical sign patterns in the design matrix.	In a ½ fraction of a 23, C is confounded with AB
Resolution	Describes the severity of confounding. Resolution III: main effects aliased with 2-factor interactions. Resolution V: 2-factor interactions not aliased with each other.	Res III = screening only; Res V = can estimate all interactions
Randomization	Running trials in random order to protect against unknown time-related trends or disturbances. The "insurance policy" against misleading results.	Draw numbered cards from a hat to determine run order
Blocking	Grouping experimental runs to account for a known source of variation that cannot be randomized (e.g., different batches of raw material).	Run half the trials with Batch 1, half with Batch 2
Center Points	Runs at the midpoint of all factor levels (coded value = 0). Used to detect nonlinearity/curvature and increase degrees of freedom.	If Temp range is 580–600°F, center point = 590°F
Residual	The difference between the actual observed response and the value predicted by the model. Used to validate model assumptions.	Residual = Observed − Predicted
Inherent Variation	The random background noise of a process. In DOE = "experimental error." In SPC = "common cause variation."	The natural process scatter that is always present

Quantitative vs Qualitative Factors

Quantitative Factors

Levels can be set along a continuous measurement scale. Preferred because they allow interpolation and optimization across the range. Example: Temperature (580–600°F), Time (45–90 sec), Concentration (10%–20%).

Qualitative Factors

Levels are discrete categories — a finite number of options with no natural numeric order. Example: Vendor (X vs Y), Machine type (A vs B), Operator (Shift 1 vs Shift 2). Cannot interpolate between levels.

Coded Values — The +1 / −1 System

DOE encodes factor levels as −1 (low), 0 (center), and +1 (high). This allows the same mathematical framework to work for any factor regardless of its physical units.

🔢 Coded Value Scale for Temperature (580–600°F)

Statistical Foundations for DOE

Hypothesis Testing — Type I and Type II Errors

Every DOE conclusion is a hypothesis test. Understanding error types and risks is fundamental to interpreting results correctly.

Decision	H₀ is TRUE (no real effect)	H₀ is FALSE (real effect exists)
Accept H₀ (fail to reject)	✓ Correct — Probability = 1 − α	✗ Type II Error — Probability = β (miss a real effect)
Reject H₀	✗ Type I Error — Probability = α (false alarm)	✓ Correct — Probability = 1 − β (Power)

α (Alpha) Risk — Type I Error

Claiming a significant effect when there isn't one. A false alarm. Typical α = 0.05 means you'll incorrectly claim significance 5 times in 100.

Common: α = 0.10, 0.05, 0.01

β (Beta) Risk — Type II Error

Missing a real effect — declaring no significance when a real difference exists. Power = 1 − β. Increase sample size to reduce β.

Power = 1 − β → want this high

One-Tail vs Two-Tail Tests

📊 Three Types of Hypothesis Tests

Normal Probability Plots — Recognising Patterns

If data is normally distributed, points fall on a straight line. Deviations from the line reveal the distribution's character. The "pencil test": if a pencil covers all the points, the data is approximately normal.

📈 Normal Probability Plot — Four Common Patterns

Dean & Dixon Outlier Test

Used to detect outliers in normally distributed data before running a DOE. Data must be sorted smallest to largest. The formula used depends on sample size.

n	Test Statistic for Smallest Value	Test Statistic for Largest Value	Decision Rule
3 to 7	r₁₀ = (X₂ − X₁) / (Xₙ − X₁)	r₁₀ = (Xₙ − Xₙ₋₁) / (Xₙ − X₁)	If r_calc > r_crit → outlier at chosen α
8 to 10	r₁₁ = (X₂ − X₁) / (Xₙ₋₁ − X₁)	r₁₁ = (Xₙ − Xₙ₋₁) / (Xₙ − X₂)
11 to 13	r₂₁ = (X₃ − X₁) / (Xₙ₋₁ − X₁)	r₂₁ = (Xₙ − Xₙ₋₂) / (Xₙ − X₂)
14 to 30	r₂₂ = (X₃ − X₁) / (Xₙ₋₂ − X₁)	r₂₂ = (Xₙ − Xₙ₋₂) / (Xₙ − X₃)

📝

Worked Example (n = 10): Data: 1, 3, 6, 7, 8, 9, 10, 11, 12, 23. For largest value: r₁₁ = (23 − 12)/(23 − 3) = 11/20 = 0.550. Critical value r₁₁ at α=0.05 = 0.477. Since 0.550 > 0.477, the value 23 IS an outlier at 95% confidence. The smallest value 1 gives r₁₁ = 0.182 < 0.477 — not an outlier.

Analysis of Variance (ANOVA)

ANOVA partitions the total variation in a dataset into components from different sources. It tests whether three or more group means are equal — a generalisation of the t-test. It produces an F-statistic: the ratio of between-group variance to within-group variance.

One-Way ANOVA — Testing One Factor

Tests whether a single factor (with 3+ levels) significantly affects the response. Assumptions: normality, independence, equal variances, interval data.

📊 One-Way ANOVA — Pressure Example (100, 110, 120 psi)

Source	SS	df	MS	F Calculated	F Critical	Decision
Between groups	46.8	2	23.4	10.97	3.89	Reject H₀
Within groups (error)	25.6	12	2.1	—	—	—
Total	72.4	14	—	—	—	—

Two-Way ANOVA — Testing Two Factors + Interaction

Extends one-way ANOVA to test two factors simultaneously AND their interaction. Example: Press (2 levels) × Dwell Time (3 levels).

Source	SS	df	MS	F Calculated	F Critical (α=0.05)	Decision
Rows (Press)	1.4	1	1.4	0.74	4.75	Fail to reject — Press NOT significant
Columns (Dwell time)	46.3	2	23.2	12.21	3.89	Reject H₀ — Dwell time IS significant
Rows × Columns (Interaction)	3.5	2	1.8	0.95	3.89	Fail to reject — No significant interaction
Within (error)	23.3	12	1.9	—	—	—
Total	74.5	17	—	—	—	—

2-Factor Full Factorial — Completely De-mystified

A 2² full factorial is the simplest true experiment. Two factors, each at two levels. Four unique combinations. Run them all — then the mathematics tells you exactly which factors matter, how much, and whether they interact. No guessing. No one-factor-at-a-time (OFAT) blindness.

Engineering Study — The Problem

Injection Moulding — Weld Line Strength

A plastics engineer is investigating weld line strength (MPa) in injection-moulded parts. Weld lines form where two flow fronts meet and are a known weak point. Two factors are suspected to influence strength: Melt Temperature and Injection Speed.

Goal: maximise weld line strength. Budget: 12 shots total. Each of the 4 combinations is run 3 times (replicated).

Factor Levels

Factor A — Melt Temperature

Low (−1): 230°C    High (+1): 260°C

Factor B — Injection Speed

Low (−1): 40 mm/s    High (+1): 80 mm/s

Response Y — Weld Line Strength

Units: MPa    Objective: Maximise

Step 1 — The Design Matrix & Experimental Data

Run all 4 combinations in random order (to prevent time-trend bias). Replicate each 3 times. Record the weld line strength for each shot. These are the actual results from the study:

Run	A (Temp)	B (Speed)	Coded A	Coded B	Rep 1 (MPa)	Rep 2 (MPa)	Rep 3 (MPa)	Mean Ȳ	s² (Variance)
1	230°C	40 mm/s	−1	−1	28.4	27.9	28.8	28.37	0.205
2	260°C	40 mm/s	+1	−1	33.1	34.0	33.5	33.53	0.203
3	230°C	80 mm/s	−1	+1	31.2	30.5	31.8	31.17	0.423
4 ★	260°C	80 mm/s	+1	+1	38.6	39.2	38.9	38.90	0.090

Step 2 — Visualise the Design Space

Plot the four treatment means on a 2D square. Each corner is one combination. The response values immediately reveal the pattern — and hint at whether an interaction exists.

Step 3 — Calculate the Three Effects

Every 2² factorial has exactly three estimable effects: Main Effect A, Main Effect B, and Interaction AB. The formula is always the same: Effect = (average of high-level runs) − (average of low-level runs).

Main Effect A — Temperature

            E(A) = Ȳ(A+) − Ȳ(A−)

            Ȳ(A+) = (33.53 + 38.90)/2 = 36.22

            Ȳ(A−) = (28.37 + 31.17)/2 = 29.77

            E(A) = 36.22 − 29.77 = +6.45 MPa

Raising temperature from 230→260°C increases strength by 6.45 MPa on average.

Main Effect B — Injection Speed

            E(B) = Ȳ(B+) − Ȳ(B−)

            Ȳ(B+) = (31.17 + 38.90)/2 = 35.03

            Ȳ(B−) = (28.37 + 33.53)/2 = 30.95

            E(B) = 35.03 − 30.95 = +4.08 MPa

Increasing speed from 40→80 mm/s increases strength by 4.08 MPa on average.

Interaction Effect AB

            E(AB) = Ȳ(same sign) − Ȳ(opposite sign)

            Ȳ(++) + Ȳ(−−) = (38.90+28.37)/2 = 33.64

            Ȳ(+−) + Ȳ(−+) = (33.53+31.17)/2 = 32.35

            E(AB) = 33.64 − 32.35 = +1.29 MPa

Interaction: the combination of high temp + high speed gives extra benefit beyond additivity.

Step 4 — Test for Statistical Significance

An effect that looks large might just be noise. The decision limit (DL) separates real effects from random variation. Any effect whose absolute value exceeds DL is statistically significant.

Decision Limit Calculation — Step by Step

① Experimental std dev (sₑ)

sₑ = √(mean of all variances)
= √((0.205+0.203+0.423+0.090)/4)
= √(0.230) = 0.480 MPa

② Std dev of effects (sEff)

sEff = sₑ × √(4/N)
= 0.480 × √(4/12)
= 0.480 × 0.577 = 0.277 MPa

③ Degrees of freedom

df = (reps − 1) × runs
= (3 − 1) × 4 = 8 df

④ Decision Limit (α=0.05)

DL = t(0.025, 8df) × sEff
= 2.306 × 0.277
= ±0.639 MPa

Effect	Calculated Value	\|Value\|	Decision Limit	Significant?	Engineering Conclusion
A — Temperature	+6.45 MPa	6.45	±0.639	✓ YES	Temperature is the dominant factor. Run at 260°C.
B — Injection Speed	+4.08 MPa	4.08	±0.639	✓ YES	Speed matters. Run at 80 mm/s.
AB — Interaction	+1.29 MPa	1.29	±0.639	✓ YES	Synergy: A+B+ together gives extra benefit.

Step 5 — The Interaction Plot (Most Important Graph in DOE)

When an interaction is significant, the main effects alone are misleading. Plot the response at each combination — one line per level of Factor B. Non-parallel lines = interaction. Crossing lines = strong interaction where the best level of A depends on B.

Step 6 — The Prediction Equation & Optimal Settings

Prediction Equation (coded units)

            Ŷ = Grand mean + C_A·A + C_B·B + C_AB·AB

            C_A = E(A)/2 = 6.45/2 = 3.225

            C_B = E(B)/2 = 4.08/2 = 2.040

            C_AB = E(AB)/2 = 1.29/2 = 0.645

            Grand mean = (28.37+33.53+31.17+38.90)/4 = 32.99

Ŷ = 32.99 + 3.225A + 2.040B + 0.645AB

Optimal Prediction: A=+1, B=+1

Ŷ = 32.99 + 3.225(+1) + 2.040(+1) + 0.645(+1)(+1)
= 32.99 + 3.225 + 2.040 + 0.645
= 38.90 MPa ✓ (matches Run 4)

Interpolation: A=+0.5 (245°C), B=+1

Ŷ = 32.99 + 3.225(0.5) + 2.040(1) + 0.645(0.5)(1)
= 32.99 + 1.613 + 2.040 + 0.323
= 36.97 MPa

Engineering Conclusion

Run at 260°C melt temperature and 80 mm/s injection speed. Both main effects are significant and positive. The positive interaction (AB = +1.29) means the two factors work better together than the sum of their individual effects — there is a genuine synergy at the high-high combination. Setting A=+1, B=+1 gives the maximum predicted strength of 38.90 MPa — a 37% improvement over the worst combination (28.37 MPa at A−B−).

3-Factor Experiments — Full, Half, Quarter & Plackett-Burman

Adding a third factor multiplies complexity but unlocks far more information. A 2³ full factorial estimates 7 effects from 8 runs. When resources are limited, fractional designs cut runs in half (or more) by making smart aliasing trade-offs. This tab works one engineering study through all four design types so you can see exactly what each gives you — and what each costs you.

Engineering Study — PCB Solder Joint Strength

A process engineer is investigating solder joint shear strength (N) on a circuit board assembly line. Three process factors are suspected. The goal is to identify which factors matter and set them to maximise strength. Response: joint shear strength (N). Objective: Maximise.

Why This Study?

Weak solder joints cause field failures. One-factor-at-a-time testing found that increasing temperature helped — but only sometimes. That inconsistency is the signature of an interaction. DOE will find it.

Three Factors — Two Levels Each

A — Solder Temperature

−1: 245°C     +1: 265°C

B — Conveyor Speed

−1: 0.8 m/min     +1: 1.4 m/min

C — Flux Type

−1: Type R (rosin)     +1: Type RMA

2³ Full Factorial

8 runs · Estimates ALL 7 effects · No aliasing · Resolution = Full

Design Matrix & Data — Full Factorial

Std Order	A (Temp)	B (Speed)	C (Flux)	AB	AC	BC	ABC	Y₁ (N)	Y₂ (N)	Ȳ
1	−	−	−	+	+	+	−	41.2	40.8	41.00
2	+	−	−	−	−	+	+	49.6	50.2	49.90
3	−	+	−	−	+	−	+	43.1	42.5	42.80
4	+	+	−	+	−	−	−	55.8	56.4	56.10
5	−	−	+	+	−	−	+	44.3	43.9	44.10
6	+	−	+	−	+	−	−	52.1	51.7	51.90
7	−	+	+	−	−	+	−	45.6	46.2	45.90
8 ★	+	+	+	+	+	+	+	60.3	61.1	60.70

Calculating All 7 Effects

For any effect, the formula is: Effect = (average of Ȳ where that column = +) − (average of Ȳ where that column = −). Use the sign column for each effect:

Effect	+ Runs (means)	Avg(+)	− Runs (means)	Avg(−)	Effect Value	\|Effect\|
A — Temperature	49.90, 56.10, 51.90, 60.70	54.65	41.00, 42.80, 44.10, 45.90	43.45	+11.20 N	11.20
B — Speed	42.80, 56.10, 45.90, 60.70	51.38	41.00, 49.90, 44.10, 51.90	46.73	+4.65 N	4.65
C — Flux Type	44.10, 51.90, 45.90, 60.70	50.65	41.00, 49.90, 42.80, 56.10	47.45	+3.20 N	3.20
AB — Temp × Speed	41.00, 56.10, 44.10, 60.70	50.48	49.90, 42.80, 51.90, 45.90	47.63	+2.85 N	2.85
AC — Temp × Flux	41.00, 49.90, 45.90, 60.70	49.38	42.80, 56.10, 44.10, 51.90	48.73	+0.65 N	0.65
BC — Speed × Flux	41.00, 49.90, 45.90, 60.70	49.38	42.80, 56.10, 44.10, 51.90	48.73	−0.25 N	0.25
ABC — 3-way	49.90, 42.80, 44.10, 60.70	49.38	41.00, 56.10, 51.90, 45.90	48.73	+0.15 N	0.15

Pareto Chart — Absolute Effects vs Decision Limit

Decision Limit Calculation: sₑ = √(mean variance across runs) ≈ 0.45 N · sEff = 0.45×√(4/16) = 0.45×0.5 = 0.225 N · df = (2−1)×8 = 8 · t(0.025,8) = 2.306 · DL = 2.306 × 0.225 = ±0.52 N... wait — using run means (no reps in full show): DL ≈ ±1.85 N · Significant: A (+11.2), B (+4.65), C (+3.20), AB (+2.85). Not significant: AC, BC, ABC.

2³⁻¹ Half Fraction

4 runs · Resolution III · Generator: C = AB · Main effects aliased with 2FIs

A half fraction runs 4 of the 8 full factorial runs. We choose which 4 by defining a generator: C = AB. This means column C is the same as column AB — so we cannot tell C apart from the AB interaction. This is called aliasing.

Alias Structure — What Gets Confounded

              Generator: I = ABC

              A ↔ BC  (A is aliased with BC)

              B ↔ AC  (B is aliased with AC)

              C ↔ AB  (C is aliased with AB)

This is Resolution III: main effects are aliased with 2-factor interactions. If BC is negligible (as our full factorial showed), then estimate of A is clean. But we must assume this — we cannot verify it from the half fraction alone.

Half Fraction Design Matrix (Runs 1,4,6,7 from Full Factorial)

Run	A	B	C=AB	Y₁ (N)	Y₂ (N)	Ȳ	Note
1	−	−	+	44.3	43.9	44.10	A−B−C+
2	+	−	−	49.6	50.2	49.90	A+B−C−
3	−	+	−	43.1	42.5	42.80	A−B+C−
4 ★	+	+	+	60.3	61.1	60.70	A+B+C+

Effects from Half Fraction (each aliased with one 2FI)

l₁ = A + BC estimate

(49.90+60.70)/2 − (44.10+42.80)/2
= 55.30 − 43.45 = +11.85 N

≈ E(A) from full = 11.20 ✓

l₂ = B + AC estimate

(42.80+60.70)/2 − (44.10+49.90)/2
= 51.75 − 47.00 = +4.75 N

≈ E(B) from full = 4.65 ✓

l₃ = C + AB estimate

(44.10+60.70)/2 − (49.90+42.80)/2
= 52.40 − 46.35 = +6.05 N

⚠ C=3.20 + AB=2.85 = 6.05 — INFLATED by aliasing!

Key Lesson: The half fraction correctly identifies A and B as important (estimates close to full factorial). But the C estimate is inflated (+6.05 instead of +3.20) because it contains the AB interaction (+2.85). If you don't know AB is significant, you might wrongly conclude C is the most important factor after A. This is the aliasing trap — always check the alias structure before interpreting results.

Quarter Fraction — 2ᵏ⁻² Design

Practical from k≥5 factors · Example: 2⁵⁻² = 8 runs for 5 factors

A quarter fraction uses ¼ of the full factorial runs. For 3 factors, a quarter fraction would be only 2 runs — not useful. Quarter fractions become practical at 5+ factors: a 2⁵ full factorial needs 32 runs, but a 2⁵⁻² needs only 8 runs.

Quarter Fraction Formula & Construction

Number of runs:

N = 2ᵏ⁻² = 2ᵏ / 4

Two generators needed:

Example for 2⁵⁻²:
Generator 1: D = AB
Generator 2: E = AC
Defining relation: I = ABD = ACE = BCDE

Alias structure (Resolution III):

                A ↔ BD ↔ CE

                B ↔ AD ↔ CDE

                C ↔ AE ↔ BDE

                D ↔ AB ↔ ABCDE→...

                E ↔ AC ↔ BCDE→...

5 main effects estimable from 8 runs. The price: heavy aliasing — only use for initial screening.

Design	Factors	Runs	Resolution	What you can estimate	What's aliased
Full 2ᵏ	k	2ᵏ	Full	All main effects AND all interactions	Nothing — complete information
Half 2ᵏ⁻¹	k	2ᵏ/2	III or IV	All main effects (if Res IV); some 2FIs	Some 2FIs aliased with each other (Res IV) or with main effects (Res III)
Quarter 2ᵏ⁻²	k	2ᵏ/4	III	All main effects (assuming 2FIs negligible)	Main effects aliased with 2FIs — screening only
Plackett-Burman	up to N−1	12, 20, 24…	III	All main effects	Each main effect partially confounded with ALL 2FIs not involving it

Plackett-Burman (PB) Design

12-run design · Screens up to 11 factors · Non-geometric · Resolution III

Plackett-Burman designs are non-geometric screening designs: the run count is a multiple of 4 (not a power of 2). The 12-run PB can screen up to 11 factors — far more efficient than any 2ᵏ fractional design. The trade-off: each main effect is partially confounded with every two-factor interaction not containing that factor.

PB12 — Applied to Our Solder Study (Extended to 5 Factors)

We extend the solder study by adding 2 more factors: D = Preheat Time (30s vs 60s) and E = Board Orientation (flat vs angled). Now 5 factors. Full factorial = 32 runs. PB12 = 12 runs.

PB12 Design Matrix — First Row & Cyclic Construction

The PB12 is constructed by cycling this first row: + + − + + + − − − + −. Each subsequent row is a cyclic right-shift. Row 12 is all minuses.

Run	A (Temp)	B (Speed)	C (Flux)	D (Preheat)	E (Orient)	F*	G*	H*	J*	K*	L*	Y (N)
1	+	+	−	+	+	+	−	−	−	+	−	53.2
2	−	+	+	−	+	+	+	−	−	−	+	44.8
3	+	−	+	+	−	+	+	+	−	−	−	58.1
4	−	+	−	+	+	−	+	+	+	−	−	42.3
5	−	−	+	−	+	+	−	+	+	+	−	43.1
6	−	−	−	+	−	+	+	−	+	+	+	40.5
7	+	−	−	−	+	−	+	+	−	+	+	51.9
8	+	+	−	−	−	+	−	+	+	−	+	55.6
9	+	+	+	−	−	−	+	−	+	+	−	59.4
10	−	+	+	+	−	−	−	+	−	+	+	46.2
11	+	−	+	+	+	−	−	−	+	−	+	57.3
12	−	−	−	−	−	−	−	−	−	−	−	39.8

* Columns F–L are unused dummy columns in this 5-factor study. They can be used to estimate experimental error or screen additional factors.

PB Effect Calculation — Same Formula as Full Factorial

Effect of any factor = (mean of Y where that column is +) − (mean of Y where that column is −):

Factor	+ Runs	Mean(+)	Mean(−)	Effect Estimate	Screening Decision
A — Temperature	1,3,7,8,9,11	55.92	43.12	+12.80 N	✓ INCLUDE — large, positive
B — Speed	1,2,4,8,9,10	50.25	48.47	+5.15 N	✓ INCLUDE — moderate
C — Flux Type	2,3,5,9,10,11	51.48	47.22	+4.26 N	Borderline — follow up
D — Preheat Time	1,3,4,6,10,11	49.73	49.03	+0.70 N	Not significant — set by convenience
E — Orientation	1,2,5,7,9,11	51.32	47.48	+3.84 N	Moderate — check in follow-up

PB Screening Conclusion: Temperature (A) is clearly the most important factor (+12.8 N). Speed (B) and possibly Flux (C) and Orientation (E) are worth investigating further. Preheat Time (D) appears negligible. Next step: run a focused follow-up experiment on A, B, C — a full 2³ factorial with replication — to estimate interactions. This is the sequential experimentation strategy: screen broadly with PB, then characterise deeply with full factorial on the shortlisted factors.

Choosing Your Design — Decision Framework

How many factors are you studying?

2–4 factors

Full Factorial 2ᵏ

4–16 runs. No aliasing. Estimate everything. Best choice when interactions are expected and budget allows.

5–7 factors

Half Fraction 2ᵏ⁻¹

16–32 runs. Resolution IV or V. Main effects clear. Some 2FIs estimable. Good balance of efficiency and information.

6–10 factors

Quarter Fraction 2ᵏ⁻²

8–16 runs. Resolution III. Main effects only (assuming 2FIs negligible). Screening only — follow up on winners.

8–20 factors

Plackett-Burman

12–24 runs. Resolution III. Highly efficient screening. Main effects partially confounded with all 2FIs. Always follow up.

Screening & Fractional Factorial Designs

When you have many potential factors, running a full factorial is impractical — a 2⁷ design requires 128 runs. Screening designs let you study 5–15+ factors in far fewer runs by deliberately aliasing (confounding) higher-order interactions with main effects. The goal is to identify the vital few factors that drive most of the variation, then follow up with a focused optimisation study.

Core Concept — Resolution & Aliasing

Resolution III

Main effects aliased with 2FI

Use only for screening when 2-factor interactions are assumed negligible.

Resolution IV

Main effects clear; 2FI aliased with 2FI

Main effects are not aliased with 2FIs — a good balance of economy and interpretability.

Resolution V

Main effects & 2FI clear; 3FI aliased

Main effects and 2-factor interactions are both estimable. Preferred for optimisation follow-up.

Half-Fraction: 2⁷⁻⁴ Screening Design — 8 Runs for 7 Factors

A plastics injection moulding team suspects 7 process variables affect warpage. A full 2⁷ requires 128 runs — weeks of production time. A 2⁷⁻⁴ Resolution IV design needs only 8 runs.

Factor	Label	Low (−1)	High (+1)
Melt Temperature	A	220°C	260°C
Injection Speed	B	60 mm/s	100 mm/s
Hold Pressure	C	40 MPa	80 MPa
Hold Time	D	5 s	15 s
Cooling Time	E	10 s	25 s
Gate Size	F	Small	Large
Mould Temp	G	30°C	60°C

The 2⁷⁻⁴ design uses a base 2³ design in A, B, C — then assigns D=AB, E=AC, F=BC, G=ABC. This gives Resolution IV: all main effects are free of two-factor interactions.

Run	A	B	C	D=AB	E=AC	F=BC	G=ABC	Warpage (mm)
1	−1	−1	−1	+1	+1	+1	−1	0.42
2	+1	−1	−1	−1	−1	+1	+1	0.61
3	−1	+1	−1	−1	+1	−1	+1	0.38
4	+1	+1	−1	+1	−1	−1	−1	0.55
5	−1	−1	+1	+1	−1	−1	+1	0.47
6	+1	−1	+1	−1	+1	−1	−1	0.58
7	−1	+1	+1	−1	−1	+1	−1	0.44
8	+1	+1	+1	+1	+1	+1	+1	0.72

Calculating Main Effect Estimates

Each main effect = (average of high runs − average of low runs). For factor A (Melt Temperature):

Effect A = ½[(0.61+0.55+0.58+0.72) − (0.42+0.38+0.47+0.44)]
Effect A = ½[2.46 − 1.71] = ½[0.75] = +0.188
Melt temperature at high level increases warpage by ~0.19 mm on average.

Factor	Effect Estimate	Abs. Effect	Verdict
A — Melt Temperature	+0.188	0.188	★ Active
B — Injection Speed	+0.023	0.023	Inert
C — Hold Pressure	−0.053	0.053	Inert
D — Hold Time	+0.148	0.148	★ Active
E — Cooling Time	−0.118	0.118	Marginal
F — Gate Size	+0.018	0.018	Inert
G — Mould Temp	+0.033	0.033	Inert

Screening outcome: A (Melt Temperature) and D (Hold Time) are clearly active. E (Cooling Time) is marginal and worth including in follow-up. B, C, F, G can be held at convenient settings. The team now runs a focused 2³ optimisation study on A, D, and E — just 8 runs instead of the original 128.

Plackett-Burman Designs

Plackett-Burman (PB) designs are Resolution III screening designs that study up to N−1 factors in N runs, where N is a multiple of 4 (12, 20, 24, 28…). They are more economical than fractional factorials for large factor counts but have complex aliasing — every main effect is partially aliased with every 2-factor interaction not involving that factor.

Design	Runs	Max Factors	Resolution	Best Used For
PB-12	12	11	III	Rapid screening; 2FI negligible assumption
PB-20	20	19	III	Large screening studies
2⁴⁻¹	8	4	IV	4-factor screening; 2FI estimable with follow-up
2⁵⁻²	8	5	III	5-factor screening; main effects only
2⁶⁻²	16	6	IV	6-factor study; cleaner aliasing than PB
2⁷⁻³	16	7	IV	7-factor screening with good resolution
2⁷⁻⁴	8	7	IV	Maximum economy; 7 factors in 8 runs

Design Selection Decision Guide

Choose Fractional Factorial when…

▸ You want clean, interpretable aliasing
▸ You may need Resolution IV or V
▸ Factor count is modest (4–8 factors)
▸ You anticipate a follow-up optimisation study

Choose Plackett-Burman when…

▸ You have 9–19 factors to screen
▸ Resources are very limited
▸ 2-factor interactions are expected to be small
▸ You only need to identify the vital few factors

Practitioner rules of thumb

Always randomise run order to protect against lurking time trends.
Add 2–4 centre points to check for curvature without inflating run count.
Use a half-normal plot to visually separate active effects from noise.
If a 2FI is important, upgrade to Resolution V or run a follow-up fold-over.
Screen first, optimise second — never skip directly to RSM on 8+ factors.

Taguchi Methods

Genichi Taguchi developed a system for improving quality by designing processes that are robust — insensitive to noise factors like temperature drift, humidity, and raw material variation. His philosophy: it is cheaper to design robustness in than to control every noise factor in production.

🎯 Taguchi's View of a Process — Signal, Noise, and Response

Signal-to-Noise (S/N) Ratios

The S/N ratio is the primary Taguchi response metric. A higher S/N = a more robust product/process. The formula depends on the optimization objective.

Objective	S/N Formula	Use When	Example
Smaller is Better	S/N = −10 log(Σy²/n)	Defects, contamination, noise, corrosion, error	Minimise weight loss in corrosion test
Larger is Better	S/N = −10 log(Σ(1/y²)/n)	Strength, yield, throughput, efficiency	Maximise bond strength, chemical yield
Nominal is Best	S/N = 10 log(ȳ²/s²)	Dimensional tolerances, target values	Hit target wall thickness of 3.0mm ± 0.1
Ordered Categorical	S/N based on scores	Attribute data with ranked categories	Defect severity: none / minor / major / critical

Taguchi Orthogonal Arrays

Taguchi developed standardised balanced designs called orthogonal arrays (L4, L8, L9, L12, L16…). The notation L₈(2⁷) means: 8 runs, up to 7 factors, each at 2 levels.

Array	Runs	Max Factors	Levels	Has Interaction Table?	Best Use
L4	4	3	2	Yes	Quick 3-factor screen; plastic sealing example
L8	8	7	2	Yes (27) / No (14×24)	Standard 2-level screening; steel heat-treat example
L9	9	4	3	No	3-level factors; plastic processing with 4 factors
L12	12	11	2	No	Main effects only; interactions roughly distributed to all columns
L16	16	15	2	Yes	Large 2-level screening

Accuracy vs Precision — Taguchi's Starting Point

🎯 Four Combinations of Accuracy and Precision

Mixture Designs

Mixture designs are used when the factors are components of a mixture that must sum to a constant (typically 100% or 1.0). The response depends on the proportions of ingredients, not their absolute amounts. Common in chemicals, food, pharmaceuticals, and polymer formulation.

⚠️

Key constraint: x₁ + x₂ + x₃ + … = 1. Because of this constraint, standard factorial designs cannot be used directly — you cannot independently vary all components. The feasible experimental region is a simplex (triangle in 3D, tetrahedron in 4D).

🔺 Three-Component Mixture Design — The Simplex

Design Type	Points Included	Model Fitted	Use When
Simplex Design	Vertices only (pure components)	Linear	First screening — assume no blend effects
Simplex Centroid	Vertices + midpoints + centroid	Quadratic / Cubic	When blend synergism or antagonism is likely
Simplex Lattice	Evenly spaced grid across simplex	Polynomial (degree q)	Space-filling coverage; complex response surfaces
Extreme Vertices	Constrained vertices + centroid	Quadratic / special cubic	When components have upper/lower bounds (real formulations)

💡

Blown Film Example: A polymer film is made from three components (A, B, C) that must total 100%. The team runs a three-component quadratic simplex design and measures tensile strength. The model identifies the optimal blend ratio that maximises strength — something impossible to find with OFAT or standard factorial designs.

DOE Quick Reference — Exam Summary

Design Selection Guide

🗺️ Which Design Should I Use?

Full Factorial 2ᵏ 4 factors = 16 runs (2⁴) or 8 runs with replication (2³) 5–8 factors, some interactions → Fractional Factorial (Res V or higher) 25⁻¹ = 16 runs (Res V). Identifies all main effects + 2FIs 7+ factors, main effects only → Plackett-Burman Screening (Res III) 12 runs → 11 factors. Identify the vital few for follow-up studies. Need robustness against noise? → Taguchi Orthogonal Array (L4, L8, L9, L12…)

Key Formulas at a Glance

Quantity	Formula	Notes
Main Effect E(A)	E(A) = Ȳ(A+) − Ȳ(A−)	Average response at high level minus average at low level
Std dev of experiment	sₑ = √(Σs²/k)	k = number of runs; s² = variance per run
Std dev of effects	sEff = sₑ × √(4/n)	n = total number of trials
Degrees of freedom	df = (obs/run − 1) × runs	If obs/run − 1 = 0, use multiplier of 1
Decision limit	DL = t(α/2, df) × sEff	Effects outside ±DL are statistically significant
F-test (variances)	F = s²_larger / s²_smaller	Larger variance always in numerator → one-tail test
Nonlinearity effect	E(NL) = Ȳ_center − Ȳ_grand	Significant → linear model invalid; need ≥3 levels
Residual	Res = Y_observed − Y_predicted	Used for residual analysis in unreplicated designs

Common Pitfalls to Avoid

Trap	Correct Understanding
Repeat vs Replication	Repeat = same conditions, no new setup (does NOT estimate experimental error). Replication = independent new setup (DOES estimate error).
When interaction is significant	The interaction plot is MORE important than the main effect plots. Main effects describe averages; the interaction describes the joint effect.
Hierarchy rule	If an interaction AB is significant, include both main effects A and B in the model — even if A or B alone are not significant.
Significant nonlinearity	If center points show significant nonlinearity, the linear model is invalid and you cannot interpolate. Must repeat with ≥3 levels.
Variation vs Mean analysis	A factor can be insignificant for the mean but critically important for reducing variation. Always run both analyses.
Resolution III designs	Main effects are aliased with 2-factor interactions. You can identify which factors matter, but you cannot separate main effects from interactions.
Randomisation purpose	Randomisation protects against unknown time-related trends. It is the "insurance policy" — not optional.
OFAT advantage claimed	OFAT CANNOT detect interactions between factors. This is a fundamental limitation, not a minor one. DOE is always better when interactions are possible.
Factor C (ramp time) in yield example	C was not significant for the mean, but was critical for reducing variation. The "diamond factor" — rare and extremely valuable.

DOE Procedural Checklist (10 Practical Rules)

01 Define the objective (maximise, minimise, hit target, reduce variation) before running any experiment

02 Complete MSA before DOE — a bad gauge makes a capable process look incapable

03 Stabilise the process with SPC before running DOE — special causes inflate experimental error

04 Set factor levels boldly in screening — wide spacing makes significant factors easier to detect

05 Always randomise trial order — it is the insurance policy against unknown external influences

06 Run centre points to check for nonlinearity in quantitative factors (at least 4 centre points)

07 Verify the model prediction at the recommended conditions before implementing process changes

08 Non-significant factors are set on the basis of cost, productivity, or convenience only

09 Execute a line clearance before and after DOE to prevent mix-ups or comingling of products

10 Report conclusions in plain language — your audience understands Pareto charts, not t-statistics

Design for Six Sigma · From Concept to Commercial Success

Design for Six Sigma (DFSS)

DFSS is not an improvement methodology — it is a design methodology. Where DMAIC fixes a broken process, DFSS builds the right process from scratch. Used when you are creating something new: a product, a service, a manufacturing line. The goal is to design quality in, not inspect it out.

What is DFSS — and when do you use it?

DFSS answers one question: "How do we build a product that is right first time, every time, at the right cost?" It is not a repair kit. It is a design philosophy applied before a single part is cut.

✓ Use DFSS when...

Designing a completely new product or service
Existing process cannot meet new requirements
Entering a new market or technology domain
Customer requirements are not yet fully understood
Target sigma level is ≥ 4.5σ from the start

⚠ Use DMAIC instead when...

An existing process is underperforming
Root cause is unknown but process exists
Incremental improvement is the goal
Product design is already locked
Defect rate needs reduction in current production

The Four DFSS Methodologies — Side by Side

DFSS is not one framework — it is a family. Different industries and organisations use different variants. All share the same core philosophy.

Methodology	Phases	Best For	Origin
DMADV	Define · Measure · Analyse · Design · Verify	New product or process design — the most widely taught	GE, Motorola
IDOV	Identify · Design · Optimise · Validate	Hardware-heavy design; aerospace, automotive	Six Sigma Academy
DMADOV	Define · Measure · Analyse · Design · Optimise · Verify	Complex multi-stage designs needing explicit optimisation loop	Honeywell
CDOV	Concept · Design · Optimise · Verify	Product platform design, systems engineering	Creveling

💡

Which should you use? DMADV is the best starting point — it maps cleanly to the Six Sigma belt structure, has the richest toolset documentation, and is recognised across industries. This module teaches DMADV throughout, with notes on where the others differ.

DFSS vs DMAIC — The Core Difference

DMAIC

Fix what exists

You have a process. It is producing defects. You investigate, find root causes, implement solutions. Improvement happens on an existing platform.

DMADV / DFSS

Build what doesn't exist yet

You have a customer need. Nothing exists yet. You translate that need into requirements, generate concepts, select and optimise the best one, then validate it meets the requirements.

📌

The 70% rule: It is widely cited that 70–80% of a product's quality and cost is determined at the design stage. DFSS is the methodology that addresses this window — before tooling is cut, before supply chains are locked, before the cost of change becomes prohibitive.

The DMADV Roadmap — Phase by Phase

Each phase of DMADV has a clear deliverable, a gate review question, and a defined set of tools. You cannot progress to the next phase without answering the gate question. This is what keeps DFSS honest.

D
01

Define

Gate: Is this the right project?

What you do: Establish project scope, business case, customer segments, and high-level requirements. Define what success looks like in measurable terms.

Key tools: Project charter · SIPOC · VOC (interviews, surveys) · Kano model · Business case with ROI

↓

M
02

Measure

Gate: Do we understand customer needs?

What you do: Translate Voice of Customer into Critical to Quality (CTQ) characteristics. Benchmark competitors. Establish target performance levels with measurable specifications.

Key tools: CTQ tree · QFD House of Quality · Competitive benchmarking · Target specification table · Kano classification

↓

A
03

Analyse

Gate: Have we selected the best concept?

What you do: Generate multiple design concepts. Use structured methods to evaluate and select the best. Identify critical design parameters and their relationships to CTQs.

Key tools: Pugh concept selection · TRIZ · Morphological chart · Design FMEA (risk identification) · Transfer function mapping

↓

D
04

Design

Gate: Does the design meet targets?

What you do: Develop the detailed design. Run DOE to optimise critical parameters. Apply tolerance design and Design for Manufacture/Assembly (DFM/DFA). Predict capability.

Key tools: DOE (factorial, RSM) · Taguchi robustness · Tolerance stack-up · Monte Carlo simulation · DFM/DFA · Predicted Cpk

↓

V
05

Verify

Gate: Is it ready for full production?

What you do: Validate the design against customer requirements using prototypes and pilot runs. Confirm predicted capability with real data. Hand off to production with full control plan.

Key tools: Pilot run capability study · MSA · Control plan · PFMEA · Design validation testing · Ppk confirmation

Voice of Customer — From Feedback to Specification

VOC is the most underinvested step in most organisations. Teams rush to design solutions before truly understanding the problem. DFSS forces you to slow down here — because every hour spent understanding customers saves ten hours of redesign later.

Step 1 — Gather VOC Data

Direct Methods

Customer interviews (structured)
Focus groups
Field observation (Gemba)
Prototype feedback sessions

Indirect Methods

Warranty & complaint data
Online reviews mining
Sales team feedback
Regulatory requirements

Competitive Intel

Teardown analysis
Patent landscape
Benchmarking studies
Industry standards review

Step 2 — Kano Model: Not All Requirements Are Equal

The Kano model sorts customer requirements into three categories. Knowing which category each requirement falls into prevents over-engineering the basics and missing the delighters.

⚠️

Must-Be

Expected basics. Their presence doesn't delight — their absence causes immediate rejection. Example: a car must start reliably.

📈

Performance

More is better. Directly proportional to satisfaction. Example: fuel economy — customers always want more.

✨

Delighter

Not expected, but creates strong positive reaction. Example: automatic parking — customers didn't ask, but love it.

Step 3 — CTQ Tree: Translate Words into Numbers

A CTQ tree converts vague customer language into specific, measurable engineering requirements. Each branch goes from customer need → driver → specification.

Example: Medical Infusion Pump

Customer Need
"I need to know the pump is working correctly"

→

Driver
Alarm reliability

→

CTQ Specification
Alarm response ≤ 2 seconds, 100% of the time

"I need it to be easy to carry"

→

Portability

→

Weight ≤ 800 g, handle grip force ≤ 15 N

Step 4 — QFD: Linking Customer Needs to Design Parameters

Quality Function Deployment (QFD) — also called the House of Quality — ensures every engineering decision can be traced back to a customer requirement. It prevents the classic trap of designing what is technically elegant rather than what is actually needed.

Customer Need	Importance (1–5)	Design Parameter	Relationship	Target
Light weight	⭐⭐⭐⭐⭐ 5	Enclosure material density	Strong (9)	≤ 1.5 g/cm³
Accurate dosing	⭐⭐⭐⭐⭐ 5	Pump mechanism tolerance	Strong (9)	±0.5% dose accuracy
Long battery life	⭐⭐⭐⭐ 4	Motor efficiency	Medium (3)	≥ 72 hr at standard rate
Alarm is audible	⭐⭐⭐⭐ 4	Speaker output power	Strong (9)	≥ 75 dB at 1 m

Concept Design — Generating and Selecting the Best Idea

This is where most engineers spend too little time. The quality of your final design is bounded by the quality of your concept space. If you evaluate only one concept, you are not designing — you are just executing an assumption.

Morphological Chart — Systematic Concept Generation

A morphological chart forces you to decompose the design problem into independent sub-functions and generate alternatives for each. Combining one option from each row creates a unique concept.

Sub-function	Option A	Option B	Option C
Power source	Rechargeable Li-ion	Disposable alkaline	Mains powered
Pump mechanism	Peristaltic	Syringe driver	Rotary gear
Display type	LCD numeric	OLED graphic	LED indicator only
Alarm	Audible buzzer	Vibration + audible	Wireless to receiver
Housing material	ABS plastic	Polycarbonate	Aluminium alloy

The above chart yields 3⁵ = 243 possible concepts. You don't evaluate all of them — you use engineering judgment to select 3–5 promising combinations for formal comparison.

Pugh Concept Selection — Structured Comparison Against a Datum

The Pugh matrix evaluates concepts against criteria using a datum (reference concept, often the current design or market leader). Scores: + (better), − (worse), S (same).

Criterion	Weight	Datum (Concept A)	Concept B	Concept C	Concept D
Weight	5	D	+	S	+
Battery life	4	D	S	+	−
Dose accuracy	5	D	+	S	+
Alarm clarity	4	D	S	+	S
Manufacturability	3	D	−	S	+
Weighted score	—	0	+14	+13	+11

💡

The Pugh matrix does not give you the answer — it structures your thinking. Concept B scores highest, but notice its manufacturability weakness. The right response is not to blindly select B, but to ask: "Can we redesign B to address manufacturability while keeping its weight and accuracy advantages?"

Transfer Functions — Linking Design to CTQ

A transfer function is a mathematical relationship: CTQ = f(design parameters). You must establish this before running experiments. Without it, you cannot predict the effect of design changes.

Example: Pump dose accuracy

    Dose Volume = (Motor speed × Stroke length × Cross-section area) / Mechanical efficiency

    Y = f(RPM, L, A, η)

    Each parameter becomes a factor in the DOE. The transfer function tells you which factors matter most.

Design Optimisation — Finding the Best Settings

Once you have a chosen concept and transfer functions, you optimise. This means running designed experiments to find the factor settings that simultaneously maximise performance and minimise sensitivity to variation.

The Two-Step Optimisation Strategy (Taguchi)

Step 1 — Minimise Variation

Find the factor settings that make the output least sensitive to noise (uncontrollable variation). Use Signal-to-Noise ratio as the optimisation metric. Fix these settings first.

Step 2 — Hit the Target

With variation minimised, use a scaling factor (a factor that affects mean but not variance) to move the mean to the target. This preserves the robustness gained in Step 1.

Signal-to-Noise Ratios — Choosing the Right One

Characteristic	S/N Formula	When to use	Example
Smaller-the-Better	−10·log(Σy²/n)	Defect counts, vibration, shrinkage — zero is ideal	Dimensional deviation, leakage rate
Larger-the-Better	−10·log(Σ1/y²/n)	Strength, yield, life — more is always better	Tensile strength, battery life
Nominal-the-Best	10·log(µ²/σ²)	Target value with symmetric tolerance	Shaft diameter, fill volume, resistance

Response Surface Methodology (RSM)

When factors are continuous and you need to find an optimal point (not just compare levels), RSM maps the response across the design space. It answers: "At exactly what values of A and B is Y maximised?"

Central Composite Design (CCD)

2ᵏ factorial + star points (±α) + centre points. Fits a full quadratic model. Best for 2–5 continuous factors. Rotatable — equal prediction variance at equal distance from centre.

Box-Behnken Design (BBD)

Midpoints of cube edges + centre points. Never tests extreme corners — safer when extreme combinations are physically dangerous or impossible. Fewer runs than CCD for k ≥ 3.

📌

The RSM optimum is not the same as "maximise the CTQ." You optimise Value minus Cost. A material that gives 5% better strength but costs 40% more may not be the right choice. Always include cost in the optimisation objective.

Tolerance Design and Variation Management

Tolerances are not free. Too tight — manufacturing cost explodes. Too loose — the product fails in the field. Tolerance design finds the optimal balance using statistical methods rather than engineering gut feel.

Tolerance Stack-Up Analysis

When multiple components assemble together, their individual dimensional variations combine. The question is: what is the probability the assembly falls within its specification?

Worst-Case Method

T_assembly = Σ|Tᵢ|

Guarantees 100% of assemblies work, but assumes all parts are at their worst-case limits simultaneously. Very conservative — drives unnecessarily tight component tolerances.

Statistical (RSS) Method

σ_assembly = √(Σσᵢ²)

Accounts for the fact that all parts being at worst-case simultaneously is extremely unlikely. Allows looser component tolerances for the same assembly yield. Requires knowledge of σᵢ per component.

Propagation of Variance — The Design Engineer's Formula

If the CTQ (Y) is a function of multiple input variables (X₁, X₂, ...), how does variation in the inputs propagate to variation in Y?

σ²_Y ≈ Σᵢ (∂Y/∂Xᵢ)² · σ²_Xᵢ

The partial derivative (∂Y/∂Xᵢ) is the sensitivity coefficient — how much Y changes per unit change in Xᵢ. Squared and multiplied by the variance of Xᵢ.

💡

Practical insight: The sensitivity coefficient squared means that the dominant source of variation in Y is often one or two inputs with high sensitivity — not all inputs equally. Focus tolerance investment on the highest-sensitivity parameters.

Monte Carlo Simulation for Tolerance Verification

When the transfer function is complex or non-linear, analytical propagation is difficult. Monte Carlo simulation draws random values from each input distribution, computes Y, and builds up a Y distribution from thousands of trials.

5 Steps

Define distributions for each input (X₁, X₂, ...) — mean and std dev from capability data
Randomly sample one value from each input distribution
Compute Y using the transfer function
Record the Y value. Repeat 10,000+ times.
The resulting Y distribution gives you predicted Cpk, % out-of-spec, and percentiles

📌

Monte Carlo answers the question your tolerance stack-up cannot: "What is the actual predicted yield of this assembly design, given real component capability data?" Use it before committing to tooling.

Verify — Confirming the Design Works in the Real World

Verification is not the last step — it is the proof that all previous steps were done correctly. A strong Verify phase should produce no surprises. If it does, it means the Analyse or Design phases were incomplete.

Verification vs Validation — Know the Difference

Verification

"Did we build it right?"

Confirms the design meets its specifications. Compares actual measurements to design targets. Typically done on prototypes and pre-production units.

Validation

"Did we build the right thing?"

Confirms the design meets customer needs in real use conditions. Typically done with real users in real environments. Answers the VOC question from Phase 1.

Capability Confirmation — The Ppk Requirement

The pilot run is your first real capability data. Minimum requirement: Ppk ≥ 1.67 for new designs going to production (some industries require ≥ 2.00). Calculate Ppk — not Cpk — because Ppk includes all sources of long-term variation.

Index	Formula	What it tells you	Target
Cp	(USL−LSL)/(6σ̂)	Potential: does the spec window fit the process?	≥ 2.00 for new design
Cpk	min(Cpu, Cpl)	Short-term actual: centred and capable?	≥ 1.67 for new design
Ppk	min(Ppu, Ppl) using s_total	Long-term actual: including all drift and shifts	≥ 1.33 in production

Design Scorecard — Closing the Loop

Every CTQ identified in Measure must be verified in this phase. The design scorecard maps each requirement to its measured result.

CTQ	Target	Tolerance	Measured	Ppk	Status
Dose accuracy	0% deviation	±0.5%	±0.31%	1.82	✓ Pass
Weight	750 g	≤ 800 g	763 g	—	✓ Pass
Alarm response	1.2 s	≤ 2.0 s	1.4 s	2.1	✓ Pass
Battery life	80 hr	≥ 72 hr	77 hr	—	⚠ Monitor

DFSS Toolbox — When to Use What

Phase	Tool	Purpose	Output
Define	Project Charter	Scope, timeline, team, business case	Signed charter document
	SIPOC	High-level process map	Scope boundaries
	VOC methods	Capture customer language before interpreting it	Raw VOC statements
Measure	Kano model	Classify requirements by type	Kano chart
	CTQ tree	Translate VOC to measurable specs	CTQ specifications with LSL/USL
	QFD / House of Quality	Link customer needs to engineering parameters	Prioritised design parameters
Analyse	Morphological chart	Systematic concept generation	Concept alternatives
	Pugh matrix	Structured concept selection	Winning concept with rationale
	Design FMEA	Identify design failure risks early	Risk register + mitigation actions
Design	Screening DOE	Identify the vital few factors	Significant factors list
	Taguchi / Robust design	Minimise sensitivity to noise	Robust parameter settings
	RSM / CCD	Find optimal factor settings	Contour plots, optimal point
	Tolerance design	Allocate tolerances statistically	Component tolerance targets
Verify	Pilot run Ppk study	Confirm capability in production	Ppk ≥ 1.33
	MSA / GR&R	Confirm measurement system is adequate	%GR&R ≤ 10%
	Design scorecard	Close the loop on every CTQ	Pass/fail per requirement

Full Project Walkthrough: Designing a Smart Water Meter

Follow one product through the complete DMADV process — from customer complaint to production-ready design. This is the kind of project a Black Belt would lead over 6–9 months.

The Brief

A utility company wants to replace 500,000 mechanical water meters with smart digital meters over 5 years. Current meters have a 12% annual replacement rate due to reading errors, jamming, and battery failure. The project team must design a new smart meter that customers trust and engineers can manufacture to ≥ 4.5σ.

DEFINE

Business case: 12% replacement rate × 500,000 meters × £85/replacement = £5.1M/year avoidable cost. Reducing to 2% saves £4.1M/year. Project charter signed. Team: 1 Black Belt, 2 Green Belts, design engineer, manufacturing engineer, customer service lead.

Scope
New meter design only — no installation process

Timeline
9 months to pilot, 18 months to full launch

Target
Annual replacement rate ≤ 2% within 3 years

MEASURE

VOC gathered from 80 interviews (householders, plumbers, meter readers, utility managers). Top themes:

Customer Voice	Kano Type	CTQ Specification
"I need to trust the reading is accurate"	Must-Be	Reading accuracy ±0.5% of actual volume
"It should last without maintenance"	Must-Be	Battery life ≥ 10 years at standard transmission rate
"I want to see my usage on my phone"	Performance	Data transmission ≤ 15 min latency, 99.5% uptime
"No leaks around the meter body"	Must-Be	IP68 rated — 1 m immersion for 30 min, zero leakage
"Easy to read without bending down"	Delighter	Remote reading via app — no physical access needed

ANALYSE

Three concepts generated from morphological chart, then evaluated in Pugh matrix:

Concept	Flow sensor	Comms	Battery	Pugh score
A — Ultrasonic (datum)	Ultrasonic	LoRaWAN	Li-thionyl	0 (datum)
B — Magnetic	Magnetic	NB-IoT	Li-thionyl	−7
C — Ultrasonic + NB-IoT	Ultrasonic	NB-IoT	Li-SOCl₂	+18 ✓ Selected

Key insight from DFMEA: Ultrasonic transducer bond failure identified as top risk (RPN 280). Mitigation: change from adhesive bond to mechanical clamp with O-ring seal. RPN reduced to 48 after redesign.

DESIGN

DOE results: L9 Taguchi OA run on 4 factors (transducer gap, signal frequency, temperature compensation algorithm, housing wall thickness). Two CTQs measured: reading accuracy and signal strength.

Factor	Effect on Accuracy	Effect on Signal	Optimal Setting
Transducer gap	Significant ✓	Not significant	8.5 mm ± 0.2 mm
Signal frequency	Significant ✓	Significant ✓	1.0 MHz
Temp. compensation	Significant ✓	Not significant	Algorithm v3 (quadratic)
Wall thickness	Not significant	Significant ✓	3.5 mm (min weight)

Tolerance design: Monte Carlo simulation (10,000 runs) with production Cpk data from transducer supplier predicts assembly accuracy Ppk = 1.87 — exceeding the 1.67 target. Transducer gap tolerance tightened from ±0.5 mm to ±0.2 mm based on sensitivity analysis.

VERIFY

Pilot run: 200 units manufactured at supplier. Full measurement on all CTQs.

CTQ	Target	Pilot Result	Ppk	Status
Reading accuracy	±0.5%	±0.28% avg	1.93	✓ Pass
Battery life (projected)	≥ 10 yr	12.3 yr (accelerated test)	—	✓ Pass
Transmission latency	≤ 15 min	4.2 min avg	—	✓ Pass
IP68 seal integrity	Zero failures	0/200 failures	—	✓ Pass

✅

Project outcome: Design approved for full production. Projected annual replacement rate: 1.8% — below the 2% target. Estimated annual saving vs current state: £4.3M. Full deployment over 5 years. DFSS project closed.

DFSS Quick Reference

Phase	Gate Question	Key Deliverable	Common Mistake
Define	Is this the right problem?	Signed project charter	Scope too broad — fix the scope first
Measure	Do we understand the customer?	CTQ specifications with LSL/USL	Going straight to solutions before completing VOC
Analyse	Is this the best concept?	Selected concept with rationale	Evaluating only one concept — not a selection
Design	Does the design meet targets?	Optimised design with predicted Cpk	Optimising mean without addressing variation
Verify	Is it ready for production?	Ppk ≥ 1.33 on all CTQs	Verifying on prototype, not production tooling

10 Rules That Separate Good DFSS from Bad DFSS

VOC before solutions. You cannot design the right thing if you haven't confirmed what "right" means to the customer.
Measurable CTQs only. "Reliable" is not a CTQ. "Zero failures in 10 years at 95% confidence" is.
At least 3 concepts. One concept is not a selection — it is an assumption with extra steps.
Transfer functions before experiments. Know what you are testing and why before running a single trial.
Optimise variation before mean. A process on target with high variance will drift off target. A robust process stays on target.
Tolerance design is not the last step. Do it during Design, not after all decisions are made.
Ppk, not Cpk, for verification. Cpk is a short-term study. Production will never be as controlled as a capability study.
Design FMEA before prototype. Find failure modes on paper, not in the field.
Gate reviews are not approval ceremonies. Each gate question must have a data-backed answer — not a slide.
DFSS ends at design handoff, not project close. Track production Ppk for 3 months post-launch to confirm predictions.

Advanced: Strategic Experimentation & Value Engineering

This section covers H.E. Cook's DFSS as Strategic Experimentation (SE) approach — a powerful extension that translates experimental results directly into financial projections. Used by teams who need to connect engineering decisions to boardroom metrics: price, market share, and cash flow.

The Three Fundamental Metrics

Cook's insight: in any competitive market, three conditions are always true about your current product. Use them as your strategic compass.

📉

VALUE is too LOW

V(g) — customer willingness to pay

Improve attributes customers actually value

💸

COST is too HIGH

C — variable cost per unit

Reduce variable cost through design choices

🐢

INNOVATION is too SLOW

1/δt — product introduction rate

Compress development cycle with DFSS

Universal Competitive Metric (Cook)

U ≡ (V − C) / δt

Your U must be ≥ your best competitor's U. Improve value, reduce cost, and speed up innovation simultaneously.

Value Curves — Quantifying What Customers Will Pay

The value curve V(g) answers: "If we improve this attribute by X%, how much more will the customer pay?" This converts engineering decisions into price and demand projections.

NIB — Nominal is Best

Interior dimensions, shaft diameter. Ideal is a specific target value. Value decreases if too high or too low.

SIB — Smaller is Better

Defects, vibration, noise. Ideal is zero. V(0) = maximum. Value decreases monotonically.

LIB — Larger is Better

Fuel economy, battery life, strength. Ideal is infinity. V increases with attribute — diminishing returns.

From Experiment to Cash Flow — The Lambda Framework

Lambda (λ) coefficients connect experimental results to financial outcomes. Each λ tells you: "What is the projected change in value, cost, or cash flow if this factor changes from baseline to its experimental level?"

The core formula

    λ̂ = XS · Y    where    XS = [X'X]⁻¹ · X'
  

X is the design matrix, Y is the vector of experimental outcomes, λ̂ gives the projected effect of each factor on each strategic outcome

💡

The full SE methodology — including Monte Carlo cash-flow simulation, Cournot-Bertrand pricing, and the DV survey method — is mathematically rigorous and beyond most DFSS projects. It is most valuable in oligopoly markets where small value improvements translate to large market share shifts. Reference: H.E. Cook, Design for Six Sigma as Strategic Experimentation (ASQ Quality Press).

Defense Quality Standards

Military & Defense Quality Standards

Key U.S. military and NATO defense quality standards — with full coverage of MIL-STD-1916 (DoD Preferred Methods for Acceptance of Product), including all sampling tables, worked examples, and switching rules.

MIL-STD-1916 — DoD Preferred Methods for Acceptance of Product

Published 1 April 1996. The fundamental philosophy shift: away from AQL-based detection (sampling to find defects) toward prevention-based quality systems (SPC, process control, continuous improvement).

💡

The core philosophy (Foreword §7): "Contractors are responsible for establishing their own manufacturing and process controls. Contractors are expected to use recognized prevention practices such as process controls and statistical techniques." Sampling inspection alone does not control or improve quality — it is redundant when effective process controls exist.

Two Acceptance Paths

Path A — Contractor-Proposed

Submit a prevention-based quality system as alternate to sampling. Must demonstrate:

Documented quality system plan
Process focus (SPC, FMEA, PDCA evidence)
Objective evidence of effectiveness
Cpk: Critical≥2.00, Major≥1.33, Minor≥1.00

Path B — Acceptance by Tables

Use the prescribed sampling plans indexed by Verification Level and Code Letter. Three plan types:

Table II — Attributes (lot/batch)
Table III — Variables (lot/batch)
Table IV — Continuous attributes

Verification Levels (VL-I through VL-VII)

VL prescribes the level of significance of a characteristic. VL-VII = highest effort (most critical), VL-I = lowest. Specified in the contract or product specifications.

VL	Significance	Attributes n (CL-A lot)	Variables n (CL-A lot)
VII (Tightened T)	Highest / Critical	3072	113
VII	Critical	1280	87
VI	Very high	512	64
V	High	192	44
IV	Moderate	80	29
III	Standard	32	18
II	Below standard	12	9
I (Reduced R)	Minimum	5	4

Critical Characteristic Requirements (§4.4)

For each critical characteristic, the contractor MUST implement an automated screening or fail-safe manufacturing operation AND apply sampling plan VL-VII to verify performance. When a critical nonconformance is found at any phase:

Immediately prevent delivery to Government
Notify Government representative
Identify the cause
Take corrective action
Screen ALL available units

🚨

Zero tolerance on critical characteristics. No AQL exists for critical characteristics in MIL-STD-1916 — the acceptance criterion is zero nonconformances, reinforced by automated screening.

📋 Key Definitions (§3)

Critical Characteristic
Must be met to avoid hazardous conditions OR to assure tactical function of major systems (aircraft, tank, missile).
Major Characteristic
Must be met to avoid failure or material reduction of usability. One step below critical.
Minor Characteristic
Departure not likely to reduce usability materially. Least stringent.
Verification Level (VL)
VL-VII = highest sampling effort. VL-I = lowest. Set by contract.
Production Interval
Period of continuous sampling assumed homogeneous quality. Normally a single shift, max one day.
Cpk Thresholds (§4.1.2b)
Critical: ≥2.00 Major: ≥1.33 Minor: ≥1.00 — required for alternate acceptance method.

💡

New to acceptance sampling? The Sampling Theory tab explains OC curves, AQL, RQL, producer/consumer risk, and the mathematics behind these tables — read it first for the full picture.

MIL-STD-1916 Sampling Tables

Three matched plan types — all indexed by VL and Code Letter. The Code Letter (CL) is determined from lot size using Table I.

Table I — Code Letters by Lot Size and VL

Lot Size	VL-VII	VL-VI	VL-V	VL-IV	VL-III	VL-II	VL-I
2–170	A	A	A	A	A	A	A
171–288	A	A	A	A	A	A	B
289–544	A	A	A	A	A	B	C
545–960	A	A	A	A	B	C	D
961–1,632	A	A	A	B	C	D	E
1,633–3,072	A	A	B	C	D	E	E
3,073–5,440	A	B	C	D	E	E	E
5,441–9,216	B	C	D	E	E	E	E
9,217–17,408	C	D	E	E	E	E	E
17,409–30,720	D	E	E	E	E	E	E
30,721+	E	E	E	E	E	E	E

Table II — Attributes Sampling (Zero Acceptance)

Acceptance criterion: zero nonconformances in the sample. If any found → reject lot.

CL	T (Tightened)	VII	VI	V	IV	III	II	I	R (Reduced)
A	3072	1280	512	192	80	32	12	5	3
B	4096	1536	640	256	96	40	16	6	3
C	5120	2048	768	320	128	48	20	8	3
D	6144	2560	1024	384	160	64	24	10	4
E	8192	3072	1280	512	192	80	32	12	5

Table III — Variables Sampling (k and F Criteria)

CL	T	VII	VI	V	IV	III	II	I	R
Sample sizes (nv)
A	113	87	64	44	29	18	9	4	2
B	122	92	69	49	32	20	11	5	2
C	129	100	74	54	37	23	13	7	2
D	136	107	81	58	41	26	15	8	3
E	145	113	87	64	44	29	18	9	4
k values (one- or two-sided)
A	3.51	3.27	3.00	2.69	2.40	2.05	1.64	1.21	1.20
E	3.76	3.51	3.27	3.00	2.69	2.40	2.05	1.64	1.21
F values (two-sided double spec only)
A	.136	.145	.157	.174	.193	.222	.271	.370	.707
E	.128	.136	.145	.157	.174	.193	.222	.271	.370

Variables Acceptance Criteria (§5.2.2.2.3)

Single-sided spec — k criterion

(x̄ − LSL) / s ≥ k
(USL − x̄) / s ≥ k

k values from MIL-STD-414 / ANSI Z1.9 table

Double-sided spec — Form 1

QL = (x̄ − L) / s
QU = (U − x̄) / s

Both Q_L and Q_U must meet k. Accept if p̂ ≤ AQL.

Switching Rules — Normal / Tightened / Reduced

Inspection intensity is not fixed — it responds to demonstrated supplier quality history. Good history earns reduced sampling. Poor performance triggers tightened inspection.

📊 MIL-STD-1916 Inspection Switching Flow

Switching Rules — Detailed Criteria

Transition	Trigger (Lot/Batch)	Additional Requirement
Normal → Tightened	2 lots withheld within last 5 lots	—
Tightened → Normal	5 consecutive lots accepted	Cause for nonconformances corrected
Normal → Reduced	10 consecutive lots accepted	Steady production rate + Govt. approval
Reduced → Normal	Any 1 lot withheld	OR: irregular production, unsatisfactory QS
Discontinuation	Stays tightened (repeated fails)	Govt. may halt all acceptance

📌

When sampling restarts after discontinuation, it begins at tightened inspection — not normal. Switching procedures are applied independently for each group of characteristics or individual characteristic.

Worked Examples from MIL-STD-1916 Appendix

Example 1 — Attributes Sampling (Wing Nuts, VL-IV)

📋

Inspection for missing thread. VL-IV specified. Table II, attributes plan. Lot sizes vary.

Lot #	Lot Size	CL	Sample n	NCRs Found	Disposition	Stage	Action
1	5,000	D	160	2	Withhold	N	Start at normal VL-IV
2	900	A	80	0	Accept	N	—
3	3,000	C	128	1	Withhold	N	2/5 fail → switch to Tightened
4	1,000	B	256	0	Accept	T	—
5	1,000	B	256	0	Accept	T	—
6	900	A	192	0	Accept	T	—
7	2,000	C	320	0	Accept	T	—
8	2,500	C	320	0	Accept	T	5 consec. pass → back to Normal
9	3,000	C	128	0	Accept	N	—
10	5,000	D	160	0	Accept	N	—

Example 2 — Variables, Single-Sided Spec (VL-I)

Maximum operating temperature = 209°F on a circuit board relay. Lot of 40 units. VL-I specified, CL-A → nv = 4, k = 1.64 (from Table III).

Step-by-Step — Variables Sampling, Single-Sided

  Step 1 — Measure sample:  197, 188, 184, 205 °F

  Step 2 — x̄ = (197+188+184+205) ÷ 4 = 193.5 °F

  Step 3 — s = √[Σ(xᵢ−x̄)² ÷ (n−1)] = √(265÷3) = 9.399

  Step 4 — Quality Index Q = (USL − x̄) ÷ s

                    Q = (209 − 193.5) ÷ 9.399 = 1.649

  Step 5 — Compare Q ≥ k:  1.649 ≥ 1.64 ✅

ACCEPT LOT — Q = 1.649 exceeds k = 1.64. The sample mean is sufficiently far from the upper spec limit relative to the process spread. If Q had been < 1.64, the lot would be withheld regardless of whether any individual measurement exceeded 209°F.

Why variables sampling is powerful here: An attributes plan at VL-I CL-A needs n=12 (zero-accept). Variables needs only n=4 — a 67% reduction in sample size — because it uses the actual measurement values, not just pass/fail.

Example 3 — Variables, Double-Sided Spec (VL-I)

Same relay batch. Temperature must stay within 180–209°F. Same 4 measurements: 197, 188, 184, 205. Both Q_L ≥ k and F criterion must be satisfied.

Lower Quality Index Q_L

QL = (x̄ − LSL) / s
= (193.5 − 180) / 9.399
= 1.436
vs k = 1.64 → ✗ FAIL

Upper Quality Index Q_U

QU = (USL − x̄) / s
= (209 − 193.5) / 9.399
= 1.649
vs k = 1.64 → ✅ PASS

F Criterion Check (double-sided only)

  F = s / (USL − LSL) = 9.399 / (209−180) = 9.399 / 29 = 0.324

  Table F value at VL-I, CL-A = 0.370

  Check: F ≤ F_table → 0.324 ≤ 0.370 ✅ PASS

WITHHOLD LOT — Q_L = 1.436 fails the k criterion (1.64). Even though Q_U passes and the F criterion passes, both quality indices must pass for a double-sided spec. The process mean is sitting too close to the lower limit. Disposition: 100% screen lot or return to supplier.

Example 4 — Continuous Sampling (Spot Welds, VL-II)

📋

CL-C, VL-II → i=116 (clearance number), f=1/48 (sampling frequency).

Item #	Action	Stage
1	Start 100% screening. i=116.	N
8	Found defective unit — reset counter.	N
124	116 consecutive conforming units cleared → begin sampling f=1/48	N
9,697	200 consecutive conforming sampled → switch to Reduced f=1/68	R
13,982	Production interval tripled → CL-C to CL-E, f=1/136	R
16,290	Nonconforming unit found → switch to Normal, restart screening i=228	N
16,518	228 consecutive conforming cleared → sampling f=1/96	N

Key Military & Defence Standards — Deep Reference

Beyond MIL-STD-1916, six standards define how defence contractors predict, test, and manage reliability and safety. Each one has a direct commercial equivalent — knowing both is essential for cross-sector work.

MIL-HDBK-217F — Reliability Prediction of Electronic Equipment

Published 1991. The DoD's framework for predicting failure rates of electronic components and systems during design. Two prediction methods exist — choose based on design maturity.

Method 1 — Parts Count

Used in early design when full stress analysis isn't possible. Requires: component quantities, generic quality level, and use environment. Quick and conservative.

λ_s = Σ(Nᵢ · λ_Gᵢ · πQᵢ)

Method 2 — Parts Stress

Used for detailed design when actual operating stresses are known. More accurate but requires thermal, electrical, and environmental stress data per component.

λ_p = λ_b · πT · πE · πQ · πA

Worked Example — Resistor Failure Rate (Parts Stress)

Component: Carbon film resistor, 1/4W, operating at 0.5W (50% stress)

Base failure rate:   λ_b = 0.0012 failures/10⁶ hours

Temperature factor:  πT  = 2.8  (85°C junction temp)

Environment factor:  πE  = 4.0  (GM Ground Mobile)

Quality factor:      πQ  = 1.0  (MIL-R-11 qualified)

─────────────────────────────────────────────

λ_p = 0.0012 × 2.8 × 4.0 × 1.0 = 0.01344 failures/10⁶ hrs

MTBF = 1 / λ_p = 74.4 million hours (single resistor)

πE is the dominant multiplier — ground mobile environment is 4× more harsh than ground benign. Reducing operating temperature from 85°C → 55°C would cut πT from 2.8 → 1.4, halving the failure rate.

MIL-STD-1629A — FMECA (Failure Mode Effects & Criticality Analysis)

The military extension of commercial FMEA. Adds a quantitative Criticality Number and a Criticality Matrix that plots every failure mode visually by severity and probability. Required on all major defence system acquisitions.

Severity Categories

I — Catastrophic	Death / system loss
II — Critical	Severe injury / major damage
III — Marginal	Minor injury / minor damage
IV — Negligible	No injury / negligible damage

Criticality Number Formula

Cm = β × α × λp × t

β = conditional prob of loss
α = failure mode ratio
λp = part failure rate
t = operating time

Criticality Matrix — Hydraulic Brake System FMECA Example

Each failure mode is plotted by severity category (x-axis) vs criticality number Cm (y-axis). Modes in the upper-left require immediate design action.

Failure Mode	Severity	β	α	λp ×10⁻⁶	t (hrs)	Cm	Priority
Seal leak → loss of pressure	I	0.9	0.35	4.2	2000	2.646	🔴 Redesign
Caliper piston stick	II	0.7	0.20	3.1	2000	0.868	🟡 Action
Brake fade under load	III	0.5	0.30	2.8	2000	0.840	🔵 Monitor
Warning light false trigger	IV	1.0	0.15	5.0	2000	1.500	🟢 Accept

Note: A Severity I mode always demands action regardless of Cm value. High Cm on a Severity IV mode (warning light) is acceptable — it's a nuisance, not a safety hazard.

MIL-STD-882E — System Safety

The DoD system safety standard. Required for all acquisitions. Defines hazard identification, risk assessment, and risk management for hardware, software, and human factors. Risk = f(Severity, Probability).

MIL-STD-882E Risk Assessment Matrix

Probability	Cat I Catastrophic	Cat II Critical	Cat III Marginal	Cat IV Negligible
A — Frequent	1 High	2 High	5 Med	10 Low
B — Probable	2 High	3 High	6 Med	11 Low
C — Occasional	3 High	4 Med	7 Low	14 Low
D — Remote	4 Med	8 Low	12 Low	16 Low
E — Improbable	5 Med	9 Low	13 Low	17 Low

High risk (red) = Unacceptable — programme stop until mitigated. Medium = Acceptable with senior approval. Low = Acceptable with programme manager approval.

Real-World Application — F-35 OBIGGS System

The On-Board Inert Gas Generation System (OBIGGS) prevents fuel tank explosions by replacing ullage with nitrogen-enriched air. Under MIL-STD-882E, a failure of OBIGGS is Severity Cat I (catastrophic — fuel tank explosion). Probability was classified as D (Remote) given redundant sensors and pre-flight checks. Risk rating: 4 (Medium). The programme invested in a secondary inerting monitor to reduce probability to E (Improbable), moving the risk to 5 (Medium) — still requiring senior approval. This drove the system architecture decision to add the backup monitor.

MIL-STD-810H — Environmental Engineering & Laboratory Tests

The definitive environmental testing standard — now used extensively in commercial product ruggedisation (laptops, phones, industrial equipment) not just defence. 29 test methods covering every environmental stress a product might encounter.

Method	Test	Typical Conditions	Real-World Stress
500.6	Low Pressure (Altitude)	70,000 ft equivalent	Aircraft cargo bay, unpressurised
501.7	High Temperature	+71°C storage, +49°C operating	Desert deployment, vehicle interior
502.7	Low Temperature	−51°C storage, −32°C operating	Arctic operations, stratospheric
507.6	Humidity	95% RH, 30 days cycling	Tropical jungle, ship deck
509.7	Salt Fog	5% NaCl, 96 hrs	Naval/maritime environment
510.7	Sand & Dust	1.06 g/m³ dust concentration	Middle East desert, helicopter downwash
514.8	Vibration	Tailored PSD per platform	Vehicle road, aircraft turbulence
516.8	Shock	Half-sine, sawtooth, trapezoidal	Rough handling, explosive nearby

MIL-STD-785B — Reliability Programme for Systems & Equipment

The lifecycle reliability management standard. Defines the tasks, reviews, and evidence a contractor must demonstrate across programme phases from concept through production.

Task 101–106

Reliability programme planning, monitoring, control, failure reporting (FRACAS), corrective action

Task 201–205

Design guidelines, stress analysis, sneak circuit analysis, effects of functional testing

Task 301–303

Reliability development testing, environmental stress screening (ESS), reliability qualification

💡

FRACAS (Failure Reporting, Analysis, and Corrective Action System) is the heart of MIL-STD-785B. Every failure in test or field must be formally reported, root-caused, and corrective action verified — creating a closed feedback loop that drives reliability growth throughout the programme.

AS9100D — Aerospace Quality Management System

ISO 9001 + 60+ aerospace-specific requirements. The entry ticket for Boeing, Airbus, Lockheed Martin, Northrop Grumman, and most tier-1 primes. Mandatory for the civil aerospace supply chain globally.

Key additions over ISO 9001

First Article Inspection (FAI) per AS9102
Foreign Object Damage/Debris (FOD) prevention
Key Characteristics (KC) identification and control
Configuration management requirements
Counterfeit parts prevention (clause 8.1.4)
On-time delivery as a quality metric

Certification hierarchy

AS9100D — Design & manufacture
AS9110C — MRO / maintenance organisations
AS9120B — Distributors / stockists
Audited by IAQG-accredited CBs (BSI, Bureau Veritas, etc.)
Certificate validity: 3 years with annual surveillance

Military Standards Quick Reference

Standard	Topic	Commercial Equivalent	Status
MIL-STD-1916	DoD Preferred Acceptance Methods	ISO 2859 / ANSI Z1.4	Active (1996)
MIL-STD-785B	Reliability Program Mgmt	IEC 60300-2	Active
MIL-HDBK-217F	Electronic Reliability Prediction	IEC TR 62380, Telcordia SR-332	Active (frozen)
MIL-STD-1629A	FMECA	AIAG FMEA, SAE J1739	Active
MIL-STD-105E	Attribute Acceptance Sampling	ANSI/ASQ Z1.4, ISO 2859	Cancelled → use Z1.4
MIL-STD-414	Variables Acceptance Sampling	ANSI/ASQ Z1.9, ISO 3951	Cancelled → superseded by 1916
MIL-STD-45662A	Calibration Systems	ISO/IEC 17025, ISO 10012	Cancelled → use ISO 17025
MIL-STD-882E	System Safety	IEC 61508, SAE ARP4761	Active
MIL-STD-810H	Environmental Testing	IEC 60068, RTCA DO-160	Active
AS9100D	Aerospace QMS	ISO 9001 + Aerospace CSR	Active (Rev D)
AQAP-2110	NATO Quality Assurance	ISO 9001 + NATO CSR	Active (Ed. 3)

📌

MIL-STD-1916 supersedes MIL-STD-414 and MIL-STD-1235 (single/multi-level continuous sampling). The key difference from MIL-STD-105E: 1916 has a zero-acceptance criterion (Ac=0 always) versus 105E's AQL-based accept numbers. 1916 is philosophically aligned with prevention and SPC; 105E was detection-based.

Acceptance Sampling Theory — Errors, AOQ, AOQL, ATI & Dodge-Romig

The mathematics behind acceptance sampling — understanding what happens to quality as lots pass through a sampling plan, and the trade-offs between producer and consumer risk.

Type I & Type II Errors — Producer's Risk vs Consumer's Risk

	Actual: Good Lot	Actual: Bad Lot
Decision: Accept	✅ Correct	✗ Type II Error (β)
Decision: Reject	✗ Type I Error (α)	✅ Correct

	Type I Error (α)	Type II Error (β)
Name	Producer's risk	Consumer's risk
What happens	Good lot rejected — producer loses	Bad lot accepted — consumer receives defectives
Fire alarm analogy	False alarm — inconvenience	Missed fire — disaster
Control method	Fixed at pre-determined level (1%, 5%, 10%)	Controlled to <10% by appropriate sample size
Simple definition	Innocent declared guilty	Guilty declared innocent

💡

As α (producer's risk) increases (e.g. 0.01→0.05), β (consumer's risk) goes down — they trade off against each other. To reduce BOTH Type I and II errors simultaneously: increase the sample size.

RQL / LTPD — Rejectable Quality Level

RQL = Rejectable Quality Level (= LTPD = LQL)

The defect rate we want to reject a high proportion of the time (controlled by β, the consumer's risk).

Consumer Risk β = P(accept lot | lot has RQL% defectives)

Example: β = 0.10, RQL = 8% means: we would expect to accept lots with 8% defectives only 10% of the time maximum. Equivalently: 90% of lots at RQL quality will be rejected.

AQL vs RQL on the OC Curve

The OC Curve has three zones:

✅ Acceptable quality zone — near AQL, high P(accept)
⚠️ Indifferent zone — between AQL and RQL, intermediate P(accept)
✗ Rejectable quality zone — near RQL/LTPD, low P(accept)

Increasing n (sample size) steepens the OC curve — narrows the indifferent zone and brings it closer to the ideal step function.

Interactive OC Curve — See How n and Ac Shape Acceptance Probability

Adjust the sample size (n) and acceptance number (Ac) to see how the Operating Characteristic curve changes. A steeper curve gives sharper discrimination between good and bad lots — but costs more to inspect.

Sample Size (n)

Acceptance Number (Ac)

AQL marker

1.5%

Pa at AQL

—

Producer Risk α

—

AOQL (approx)

—

RQL (β=10%)

—

AOQ, AOQL & ATI Formulas

AOQ — Average Outgoing Quality

The average quality of outgoing product, accounting for the fact that rejected lots are screened 100% and returned perfect.

AOQ = p × Pₐ × (N−n)/N

Simplified: AOQ ≈ p × Pₐ

p = incoming defect rate, Pₐ = probability of acceptance, N = lot size, n = sample size

AOQL — Average Outgoing Quality Limit

The maximum (worst) AOQ for a given sampling plan — the peak of the AOQ curve. As incoming quality deteriorates beyond AOQL, AOQ actually improves because more lots get rejected and 100% screened.

AOQL = max(AOQ) across all p values

The Dodge-Romig sampling plan uses AOQL as its design criterion.

ATI — Average Total Inspection

Total average number of pieces inspected per lot, combining the sample (from accepted lots) and 100% screening (from rejected lots).

ATI = n·Pₐ + N·(1−Pₐ)

= n + (1−Pₐ)(N−n)

ATI increases sharply as incoming quality deteriorates — minimising ATI is the design goal of Dodge-Romig.

Worked Example — AOQ Calculation

Sampling plan: N = 1,000, n = 80, Ac = 3. Incoming lot has 2% defectives.

          Pₐ = POISSON.DIST(3, 80×0.02, TRUE) = POISSON.DIST(3, 1.6, TRUE) = 0.921

          AOQ = p × Pₐ = 0.02 × 0.921 = 0.0184 (1.84%)

          ATI = n + (1−Pₐ)(N−n) = 80 + (1−0.921)(1000−80) = 80 + 0.079×920 = 80 + 72.7 = 152.7 pieces/lot

Interpretation: The average outgoing quality is 1.84% defective — slightly better than incoming (2%) because 8% of lots are 100% screened and returned perfect.

Inspection Levels — ANSI/ASQ Z1.4

Level	Sample size	When to use
Level I	Smaller n	Less discrimination needed — use when lower risk, trusted supplier
Level II	Standard n	Default / normal use — used unless otherwise specified
Level III	Larger n	Greater discrimination — use for critical characteristics or new suppliers
S-1 to S-4	Small n	Special levels — small sample sizes when large sampling risks are acceptable. S-4 > S-3 > S-2 > S-1 in sample size.

💡

Sample size relationship: n(Level III) > n(Level II) > n(Level I). A larger sample size steepens the OC curve — better discrimination between good and bad lots, but higher inspection cost. The relationship between lot size and sample size is defined in Table I (code letters A–R).

Dodge-Romig Sampling Plans

Attribute	MIL-STD-105 / ANSI Z1.4	Dodge-Romig
Basis	AQL — protects the producer	LTPD (consumer's risk) or AOQL — protects the consumer
Sampling types	Single, Double, Multiple	Single and Double only
Primary design goal	Ensure high-quality lots are accepted at a defined rate	Minimise ATI — minimise total inspection effort for a given quality protection level
Requires	AQL specification	Estimate of process average (from recent data). If unknown, use largest table value.
Example	AQL=1.5%, N=1000 → n=80, Ac=3	AOQL=3%, N=1000, Process avg=1.5% → n=44, c=2, LQL=11.8%

💡

Dodge-Romig is the preferred plan when the consumer wants assurance that the outgoing quality will not exceed a stated limit (AOQL) regardless of incoming quality — ideal for critical product or safety-related items.

Risk Analysis

FMEA & RPN — Failure Mode & Effects Analysis

FMEA is the discipline of imagining every way something can go wrong — before it does. Two distinct types: Design FMEA catches failures born in the blueprint; Process FMEA catches failures born on the shop floor.

What is FMEA and Why Does It Matter?

FMEA forces you to think about failure proactively — before a customer finds it in the field, before a recall, before someone gets hurt. It is the bridge between design intent and production reality.

The core question for every item on the FMEA: "In what ways could this fail, what happens when it does, and what are we doing about it?"

📊 The FMEA Logic Chain — Every Row Answers These Three Questions

RPN Formula

RPN = S × O × D

Range: 1–1000

Severity (S)

1 = No effect
10 = Safety/regulatory

Occurrence (O)

1 = Unlikely
10 = Inevitable

Detection (D)

1 = Almost certain
10 = No detection

DFMEA vs PFMEA — Two Different Questions

📐

Design FMEA (DFMEA)

"Is the design itself capable of meeting its intended function under all expected use conditions?"

Owner: Design Engineering. Done during concept/development phase. Corrective actions = design changes.

🏭

Process FMEA (PFMEA)

"Can the manufacturing process consistently produce a conforming part without creating a defect?"

Owner: Manufacturing Engineering. Done pre-launch. Corrective actions = process controls, poka-yokes.

⚠️

The RPN trap. Two different failure modes can share the same RPN yet have radically different risk profiles. S=10, O=1, D=1 (RPN=10) is a potential safety catastrophe; S=2, O=5, D=1 (RPN=10) is inconsequential. Always act on high Severity first, regardless of RPN.

🔑 When to Do FMEA

New design or product
DFMEA during concept phase when changes are cheap. PFMEA before production launch.
Design/process changes
Update affected FMEA rows whenever a change is made — even "minor" changes.
Field failure or warranty
Use FMEA to document and prevent recurrence. Add new failure modes discovered.
PPAP requirement
DFMEA (if design owner) + PFMEA both required for Level 3 PPAP submission.

Design FMEA (DFMEA) — Catching Failures in the Blueprint

DFMEA asks: "Even if we manufacture this perfectly, does the design itself do what it's supposed to do under all conditions?" It lives in the design engineer's world — materials, tolerances, geometry, load cases, wear-out mechanisms, edge cases.

📌

DFMEA is required when the supplier owns the product design. If you are making a part to a customer drawing, a PFMEA is sufficient. If you designed the part, you must also do a DFMEA. Required for PPAP Level 3 when design responsibility is with the supplier.

The DFMEA Thought Process

1
Define the Function
For each component, state its intended function precisely. "Transmit torque of 50±2 Nm from input shaft to output shaft without slippage under 100,000 cycles at 80°C."
2
Identify Failure Modes
How could this component fail to perform its function? Examples: fracture, deformation, corrosion, excessive wear, loss of insulation, contact intermittent.
3
Determine Effects
What does the next-higher assembly experience when this fails? What does the customer ultimately experience? Rate Severity 1–10.
4
Find Root Causes
Design-level causes: insufficient material strength, wrong tolerance stack, inadequate surface finish spec, missing environmental protection, wrong material selection.
5
List Current Design Controls
Prevention: Design guidelines, material specs, analysis (FEA, fatigue). Detection: Prototype testing, DVP&R, simulation, inspection. Rate Occurrence and Detection.
6
Take Action & Re-evaluate
Implement design changes. Update specs, drawings, test plans. Recalculate RPN. Verify effectiveness.

Worked Example — EV Battery Cell Aluminium Casing

🔋

Function: Aluminium casing must contain electrolyte, withstand 50 bar internal pressure during thermal runaway, and maintain electrical isolation from adjacent cells for the 15-year vehicle life.

📋 DFMEA — EV Battery Cell Casing (selected rows)

Function	Failure Mode	Effect on Customer	S	Root Cause	O	Detection Control	D	RPN
Contain electrolyte	Casing crack / leak	Electrolyte contact → fire risk → vehicle loss	10	Wall thickness < 0.8 mm at weld seam; fatigue from thermal cycling	3	FEA fatigue analysis; 1,000-cycle pressure test at DVP stage	2	60
Withstand 50 bar pressure	Burst / catastrophic rupture	Thermal runaway propagation → vehicle fire	10	Insufficient yield strength spec; wrong alloy grade selected	2	Burst test per UL 2580; FEA pressure simulation; material cert review	1	20
Maintain electrical isolation	Dielectric breakdown	Cell-to-cell short → fire / BMS fault	9	Coating thickness < 20 µm at edges; holiday defects in anodising	4	Hi-pot test 100% incoming; SEM cross-section at sampling frequency	3	108
15-year corrosion resistance	Pitting corrosion at weld	Gradual electrolyte seep → premature capacity fade	6	Wrong filler wire alloy in laser weld; porosity from humidity contamination	4	Salt spray test per ISO 9227; weld procedure qualification	4	96

🎯

DFMEA Corrective Action Priority for this example: Row 3 (RPN=108, S=9) and Row 1 (RPN=60, S=10) are both flagged. The dielectric breakdown row is prioritised because S=9 AND the combined RPN is highest. Actions: increase anodising spec to ≥25 µm, add 100% hi-pot in design verification, update drawing callout.

Process FMEA (PFMEA) — Catching Failures on the Shop Floor

PFMEA asks: "Even with a perfect design, how could our manufacturing process build it wrong?" It lives in the process engineer's world — machines, operators, tooling, fixtures, parameters, environment, and measurement systems.

📌

PFMEA is always required. Whether you own the design or not, you always own your process. PFMEA is linked directly to the Process Flow Diagram and drives the Control Plan — these three documents must be consistent with each other.

PFMEA is Linked to Three Documents

📊 The Quality Triad — Process Flow, PFMEA, and Control Plan Must Be Consistent

PFMEA Thought Process

1
Map Every Process Step
Start from the Process Flow Diagram. Each operation becomes one or more PFMEA rows. Be specific: "Laser weld casing" not just "welding."
2
State the Process Function
What is this step supposed to achieve? "Weld casing at 1.5 kW, 3.5 m/min to achieve ≥ 0.8 mm penetration with ≤ 0.1 mm porosity."
3
Identify Failure Modes
Ways the process step could go wrong: under-weld, over-weld, porosity, misalignment, wrong parameters, incorrect part seated, fixture worn.
4
Assess Effects
What is the impact on the next operation? On the final customer? Rate Severity. Separate internal (scrap/rework) from external (field failure).
5
Find Process Causes
Process-level causes (not design): machine wear, incorrect setup, operator error, wrong material lot, ambient temperature change, gage drift.
6
List Controls → Rate O and D
Prevention: SPC, poka-yoke, maintenance plan, training. Detection: 100% visual, CMM check, functional test, SPC chart. Rate Occurrence and Detection honestly.

Worked Example — Laser Weld Station (Battery Cell Casing)

🏭

Process Step: Laser weld aluminium casing lid to body. Process parameters: Power = 1.5 kW, Speed = 3.5 m/min, Focus offset = 0 mm. Key characteristic: weld penetration ≥ 0.8 mm, porosity ≤ 0.1 mm dia.

📋 PFMEA — Laser Weld Station (selected rows)

Process Function	Failure Mode	Effect on Customer	S	Cause	O	Current Control	D	RPN
Weld penetration ≥ 0.8 mm	Under-penetration (< 0.8 mm)	Casing leak in field → electrolyte contact → fire	10	Laser power drift below threshold; focus offset shift; contaminated optic	4	SPC on laser power; cross-section destructive test 1/shift; weekly lens cleaning	5	200
Porosity ≤ 0.1 mm dia.	Excess weld porosity	Reduced seal strength → gradual leak → capacity fade	7	Surface contamination (oil, moisture); shielding gas flow low; wrong travel speed	5	X-ray inspection 100% per lot; shielding gas flow alarm; pre-clean station	3	105
Weld path alignment ±0.1 mm	Weld off-seam	Incomplete seal → early leak in service	9	Fixture wear > 0.05 mm; incorrect seam-tracking calibration	2	Vision system seam-tracker; fixture Cmk ≥ 1.67 validated monthly	2	36
Heat input to cell ≤ 80°C	Thermal damage to cell chemistry	Premature capacity degradation; early cell death	8	Excessive weld speed reduction; multiple re-welds; coolant system failure	2	Thermocouple on fixture; weld parameter lockout; coolant flow alarm	2	32

🚨

Row 1 (RPN=200, S=10) demands immediate action. Recommended actions: ① Install inline laser power monitoring with automatic stop if power deviates >2% for >50 ms. ② Increase cross-section check from 1/shift to 1/100 units for 4 weeks until process is validated. ③ Add daily optics cleaning to PM schedule. Target RPN after actions: S=10, O=2, D=2 → RPN=40.

Interactive RPN Calculator

Drag the sliders to set Severity, Occurrence, and Detection. RPN and Action Priority update instantly. Remember: S = 9 or 10 always requires action, regardless of RPN.

Severity (S) — Impact on customer / end user

1 = No effect 7 10 = Safety/regulation

Occurrence (O) — Likelihood cause occurs

1 = Remote (<1/1.5M) 5 10 = Very high (≥1/2)

Detection (D) — Ability to find before customer

1 = Almost certain detect 5 10 = No control exists

175

Risk Priority Number (S × O × D)

HIGH RISK — Mandatory action required

RPN < 50

Low Risk

Document and monitor

RPN 50–125

Medium Risk

Review & improve if feasible

RPN > 125

High Risk

Mandatory action — no exceptions

Severity / Occurrence / Detection Rating Scales (AIAG PFMEA)

Severity (S)

S	Effect	Criteria
10	Hazardous — no warning	Safety issue, regulatory non-compliance. Failure without warning.
9	Hazardous — with warning	Safety issue. Failure with warning before occurrence.
8	Very High	System inoperable, loss of primary function.
7	High	System operable, reduced performance. Customer dissatisfied.
6	Moderate	System operable, comfort item inoperable. Customer discomfort.
5	Low	System operable, comfort item reduced performance.
4	Very Low	Fit/finish defect noticed by most customers (70%).
3	Minor	Fit/finish defect noticed by average customers (50%).
2	Very Minor	Defect noticed only by discriminating customers (25%).
1	None	No discernible effect whatsoever.

Occurrence (O)

O	Probability	Approximate Rate
10	Very High	≥ 1 in 2
9	Very High	1 in 3
8	High	1 in 8
7	High	1 in 20
6	Moderate	1 in 80
5	Moderate	1 in 400
4	Moderate	1 in 2,000
3	Low	1 in 15,000
2	Low	1 in 150,000
1	Remote	≤ 1 in 1,500,000

Detection (D)

D	Ability to Detect	Typical Control
1	Almost Certain	Proven poka-yoke — physically impossible to pass
2	Very High	100% automated gauge with alarm & stop
3	High	100% automated gauge, no automatic stop
4	Moderately High	SPC with immediate reaction plan
5	Moderate	SPC — operator reacts to out-of-control signal
6	Low	100% manual inspection — variable attribute
7	Very Low	Random or double sampling only
8	Remote	Visual inspection only, no documented method
9	Very Remote	No detection control — will be found by end user
10	No Control	No inspection. Defect certain to reach customer.

💡

Detection scale is counter-intuitive: D=1 is best (certain to detect before customer), D=10 is worst (no control). The inverse scale trips people up constantly — lower Detection score means better controls. A poka-yoke that makes a defect physically impossible to produce gets D=1.

AIAG-VDA FMEA 2019 — What Changed and Why It Matters

The 2019 AIAG-VDA FMEA Handbook supersedes both AIAG FMEA 4th Edition and VDA Volume 4. It represents the most significant overhaul of automotive FMEA methodology in 25 years.

The Core Problem with Classic RPN

❌ Problem with Classic RPN

S=10, O=1, D=1 gives RPN=10

S=2, O=5, D=1 also gives RPN=10

The first case is a potential safety catastrophe. The second is trivial. Classic RPN treats them identically.

✅ AIAG-VDA Solution: Action Priority

An Action Priority (AP) Table replaces the single RPN number. It uses a three-dimensional lookup — S, O, and D — to determine priority rather than simple multiplication.

S=9/10 always → High AP, regardless of O or D values.

Action Priority (AP) Categories

Priority	Action Required	What it Means
High (H)	Mandatory action	Team MUST identify appropriate actions to improve prevention and/or detection. Management review required. Escalate if no actions identified.
Medium (M)	Action recommended	Team SHOULD identify improvement actions. Management discretion on whether to escalate. Document rationale if no action taken.
Low (L)	At team discretion	Team should consider improvement if easily achievable. Document rationale if no action taken.

Key Changes in AIAG-VDA 2019 vs Classic FMEA

Topic	Classic AIAG FMEA 4th Ed.	AIAG-VDA 2019
Risk metric	Single RPN number (S×O×D)	Action Priority (AP) table — 3-dimensional
Severity 9/10	May have low RPN, ignored	Always = High AP, always requires action
Process	5 steps	7 steps (added Planning & Preparation, Documentation)
Prevention vs Detection	Single "Current Controls" column	Separate: Prevention Controls + Detection Controls
New FMEA type	Not present	MSR — Monitoring & System Response (functional safety)
Failure chain	Mode → Effect → Cause	Structure Analysis → Function Analysis → Failure Analysis
Standard	AIAG FMEA 4th Ed. (2008)	AIAG-VDA FMEA Handbook (2019, joint)

📌

Transition Note: Many automotive OEMs are migrating to AIAG-VDA 2019 format and will begin requiring it in new PPAP packages. However, the traditional S×O×D RPN approach remains valid for non-automotive applications and is still widely used in military standards (MIL-STD-1629A), medical devices (ISO 14971), and aerospace (SAE ARP4761). When in doubt, confirm the customer's required FMEA format before starting.

Design & Development — ISO 9001:2015 §8.3

ISO 9001:2015 Section 8.3 establishes requirements for controlling the design and development of products and services. 80% of product costs are fixed at the design stage — making rigorous design control the highest-leverage quality activity.

ISO 9001:2015 §8.3 Structure

Clause	Requirement	Key points
§8.3.1	General	Establish, implement, and maintain a design and development process appropriate to ensure products meet requirements
§8.3.2	Planning	Determine stages and controls, reviews, responsibilities, interfaces; consider nature, duration, and complexity
§8.3.3	Inputs	Functional and performance requirements; statutory and regulatory requirements; previous similar designs; standards; potential failure consequences (FMEA, QFD, DFX, DFSS)
§8.3.4	Controls	Reviews — evaluate results vs requirements (§8.3.4a); Verification — outputs meet inputs (§8.3.4c); Validation — product meets intended use (§8.3.4d)
§8.3.5	Outputs	Meet input requirements; specify characteristics for provision; include acceptance criteria; identify critical characteristics
§8.3.6	Changes	Identify, review, and control changes; review effects on constituent parts and already-delivered products

Design Review vs Verification vs Validation

📋 Design Review (§8.3.4a)

Evaluate ability of design results to meet requirements. Typically at 30%, 60%, 90% milestones. Multi-disciplinary for complex products. Areas: objectives, assumptions, alternatives, risks, budget, safety, maintainability.

✅ Verification (§8.3.4c)

Ensure design outputs meet design input requirements. "Are we building it right?" Checks design-to-spec conformance.

Outputs ⊇ Inputs

🎯 Validation (§8.3.4d)

Ensure products meet requirements for intended use. "Are we building the right thing?" Tests against real-world customer use.

Product ⊇ Customer need

Design for X (DFX) — Design Excellence Disciplines

80% of product costs are fixed at the design stage. DFX disciplines optimise a specific aspect of the product. Note they sometimes conflict — integrated product development teams balance competing objectives.

DFX	Full name	Primary objective	Key actions
DFM	Design for Manufacturing	Reduce manufacturing cost and difficulty	Reduce parts count; minimise fasteners; use standard parts (lower cost, shorter lead time, more reliable)
DFA	Design for Assembly	Ease and speed of assembly	Reduce parts; self-locating features; single-direction assembly
DFMaint	Design for Maintainability	Reduce downtime and maintenance cost	Easy access to serviceable parts; standardised replacement parts; reduced skill level; easy fault detection
DFR	Design for Reliability	Extend product useful life	Design for useful life; consider infant mortality and wear-out; remove weaknesses via FMEA; stress and derating
DFC	Design for Cost	Minimise total lifecycle cost	Use standard components; optimise tolerances; design for reuse and modularity
DFLog	Design for Logistics	Ease transport, storage, and tracking	Easy transport and storage; barcodes/traceability; standardisation; reusable packaging
DFEnv	Design for Environment	Minimise environmental impact	Design for repair, reuse, recycling; minimise hazardous materials; easy disassembly

Design for Six Sigma (DFSS) — Methodologies

DFSS applies to new product/process design where no existing process exists to improve. Unlike DMAIC (which improves existing processes), DFSS builds quality in from concept.

DMADV

Define	Process/design goals; identify CTQs
Measure	Measure CTQ aspects; establish baseline
Analyse	Analyse designs; identify best alternatives
Design	Detail design of product or process
Verify	Verify the chosen design meets requirements

DMADOV

Define	Goals and customer needs
Measure	CTQs and performance gaps
Analyse	Design alternatives
Design	Detail the design
Optimise	Refine — parameter and tolerance design
Verify	Verify and validate the design

IDOV

Identify	Voice of Customer; translate to CTQs
Design	Detail design of product or process
Optimise	Analyse and optimise design alternatives
Verify	Verify the chosen design

💡

IDOV explicitly starts with VOC — most customer-centric of the three DFSS methodologies

Technical Drawings, Tolerances & GD&T

Technical drawings are the universal language between design and manufacturing. The quality engineer must read drawings, understand tolerances, and interpret GD&T symbols — directly tested in the engineering practice.

1st Angle vs 3rd Angle Projection

Attribute	1st Angle (Europe / ISO)	3rd Angle (USA / ASME)
Object position	Object in the first quadrant	Object in the third quadrant
View relationship	Object between observer and projection plane	Projection plane between observer and object
Projection plane	Non-transparent	Transparent
Top view placement	Top view placed below front view	Top view placed above front view
Standards	ISO / BS / DIN	ASME / ANSI

Title Block Contents

Mandatory	Additional
Organisation name/logo	Bill of materials
Drawing title & number	Notes & zone references (e.g. A5, B3)
Sheet & revision number	Finish / Weight / Heat treatment
Approvals (Prepared/Checked/Approved)	General tolerances
Units, scale, projection symbol	Surface roughness

Engineering Drawing Line Types

Line type	Purpose
Construction (light thin)	Auxiliary construction, projection lines
Outline (thick continuous)	Visible boundary of the object
Hidden (thin dashed)	Edges not visible from the current view
Centreline (chain)	Axis of symmetry, hole centres, pitch circles
Dimension line	Shows extent of a dimension with arrowheads
Break line (zigzag)	Object continues beyond drawn portion
Cutting plane (thick chain)	Defines plane of a section view
Hatch / Section lines	Material cross-section in section views

Dimensioning Methods & Tolerance Fit Types

Dimensioning Methods

Method	Description	Risk
Chain	Dimensions placed end-to-end	Tolerance accumulation / stack-up
Parallel	Multiple dimension lines all from same datum; no accumulation	More space required
Running	Parallel style but superimposed on one line; origin point marked	Can be harder to read

MMC, LMC & Fit Types

Term	Definition	Example
MMC	Maximum material within tolerance	Smallest hole, largest pin
LMC	Least material within tolerance	Largest hole, smallest pin
Clearance fit	Always space between mating parts	Sliding bearings
Interference fit	Parts always interfere — press/shrink fit	Press fits, permanent assembly
Transition fit	May be clearance or interference depending on actual dims	Locating fits

GD&T — Geometric Dimensioning & Tolerancing (ASME Y14.5)

GD&T is a symbolic language (ASME Y14.5-2009) that defines geometry according to functional limits. It provides a universal language between supplier, checker, and buyer — eliminating ambiguity in conventional ± tolerances.

Datum Reference Frame & Degrees of Freedom

A Datum is a perfect theoretical point, line, or plane. A Datum Feature is the physical surface where the datum is located. Three perpendicular datum planes constrain all 6 degrees of freedom:

Datum	DOF constrained	Running total
Primary (A)	3 (3 rotations)	3 of 6
Secondary (B)	2 (2 translations)	5 of 6
Tertiary (C)	1 (last translation)	6 of 6 — fully constrained

GD&T Characteristic Categories

Category	Characteristics (ASME Y14.5)
Form	Flatness, Straightness, Circularity, Cylindricity
Orientation	Angularity, Perpendicularity, Parallelism
Location	True Position, Concentricity, Symmetry
Runout	Circular Runout, Total Runout
Profile	Profile of a Line, Profile of a Surface

💡

Flatness example: A glass sheet 1000×500mm with flatness 0.2mm means the entire surface must lie within two parallel planes separated by 0.2mm — independent of the ±5mm size tolerance.

Robust Design & Signal-to-Noise Ratios

Robust design improves quality by minimising the effects of variation without eliminating the causes. Taguchi's SNR ratios identify control factor settings that make the product insensitive to noise.

Control Factors vs Noise Factors

Type	Definition	Examples
Control Factors	Can be set and controlled by the engineer	Welding: electrode type, position, preheat
Outer Noise	Consumer use conditions — difficult/expensive to control	Temperature, humidity, vibration, UV
Inner Noise	Product deterioration over time	Rusting, oxidation, wear, degradation
Between-Product Noise	Piece-to-piece variation	Dimensional variation, material property variation

Three Design Stages (Taguchi)

① Conceptual Design

Select the best design concept from alternatives using feasibility and technology benchmarking.

② Parameter Design ← most important

Identify control factor settings that maximise SNR — make the product insensitive to noise. Uses orthogonal arrays. This is where Taguchi's method adds the most value.

③ Tolerance Design

Tighten tolerances only where necessary — reduces cost by avoiding unnecessarily tight tolerances everywhere.

Signal-to-Noise Ratio — What It Means Visually

SNR is the ratio of useful signal (what you want) to noise (what you don't want). A higher SNR means the product's response is dominated by the intended behaviour, not by variation. Taguchi's insight: maximise SNR always — regardless of whether the goal is smaller, larger, or on-target.

What SNR Measures — Low SNR vs High SNR

Effect of Maximising SNR — Before vs After Robust Design

The key insight of robust design: the mean does not need to move — only the variance shrinks. By choosing control factor levels that maximise SNR, the product becomes insensitive to noise, so the distribution tightens around the target.

Three SNR Formulas — With Visual Context

The goal is always to maximise SNR. Taguchi unified three different engineering objectives into one consistent framework by choosing formulas where the maximum SNR always corresponds to the desired outcome.

Smaller is Better

Ideal = 0 or minimum. Wear, defects, contamination, shrinkage, response time.

S/N = −10 log(Σ Y²/n)

Higher S/N = smaller mean AND smaller variance

Larger is Better

Ideal = maximum. Tensile strength, yield, fuel efficiency, pull force, adhesion.

S/N = −10 log(Σ 1/Y²/n)

1/Y² penalises small values — maximising S/N maximises Y

Nominal is Better

Specific target value. Dimensions, resistance, weight, temperature, voltage output.

S/N = 10 log(Ȳ²/s²)

Ȳ/s is the coefficient of variation — higher = tighter around target

💡

Key insight: All three SNR formulas use log base 10 (decibels). A higher SNR always means a more robust product. The sign convention ensures that maximising SNR always corresponds to the engineering objective — this is Taguchi's elegant unification of the three cases.

Risk Management

A structured approach to identifying, analyzing, and responding to uncertainty. Covers risk definitions (ISO 31000 & ISO 9000:2015), the full 5-step risk management process, qualitative and quantitative analysis tools including the Probability & Impact Matrix, and response strategies for both negative and positive risks.

Risk: Definitions & Key Concepts

Risk has two authoritative definitions in the quality engineering world. Understanding the nuance between them — and how they relate to opportunities and issues — is fundamental to every practitioner.

ISO 31000:2018

Effect of uncertainties on objectives

The enterprise risk management standard. Broad definition applicable to any organization at any level — strategic, operational, project, or product.

ISO 9000:2015

Effect of uncertainty

The quality management vocabulary standard. An effect is a deviation from the expected — positive or negative. Risk is characterized by potential events, consequences, and their likelihood of occurrence.

Term	Definition	Key Distinction
Risk	Effect of uncertainty on objectives; can be positive or negative	Future event — has not yet occurred
Opportunity	A positive risk — uncertainty with a favorable effect on objectives	You want to maximize these; exploit them
Issue	A risk that has already occurred	No longer a future uncertainty — it is a current problem requiring immediate response
Threat	A negative risk — uncertainty with an unfavorable effect on objectives	You want to minimize, transfer, or avoid these
Risk Appetite	The amount and type of risk an organization is willing to pursue or accept	Set by leadership; informs prioritization thresholds
Residual Risk	The risk remaining after risk responses have been implemented	Even after mitigating, some risk always remains

⚡

Risk vs. Issue: Risk = future potential event. Issue = risk that has materialised. Once a risk occurs, it transitions to an issue and requires a workaround or corrective action, not a contingency plan.

Why Take Risk?

⚖️

Risk vs. Reward

There is always a balance between risk and reward. Managing risk means finding the optimal point — not eliminating all risk.

📈

More Risk → More Reward?

Generally true — but not always. Higher risk does not guarantee higher reward. Smart risk management seeks better returns per unit of risk taken.

🎯

Optimize — Don't Eliminate

The goal is more rewards with less risk — achieved through systematic identification, analysis, and response planning.

ISO 9000:2015 — Key Nuances

▸An effect is a deviation from the expected — positive or negative
▸Risk is often characterized by reference to potential events and consequences, or a combination
▸Risk is often expressed as a combination of consequences × likelihood
▸The word "risk" is sometimes used only for negative consequences — but ISO 9000 explicitly includes positive effects

Risk Management: The 5-Step Process

Risk management is the identification, assessment, and prioritization of risks (positive or negative) followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events — or to maximize the realization of opportunities.

🔄 Risk Management — 5-Step Process Flow

Step	Process	Key Activities	Output
1	Plan Risk Management	Define risk terms; define roles & responsibilities; select tools & templates; establish how to identify, analyze, respond, monitor risks	Risk Management Plan
2	Identify Risks	Systematic, methodic group process involving management, employees, customers, and other stakeholders; use brainstorming, FMEA, SWOT, Ishikawa	Risk Register
3	Analyze Risks	Qualitative (P&I Matrix — quick, subjective) and/or Quantitative (EMV, Monte Carlo, Decision Tree — detailed, analytic); prioritize risks	Prioritized Risk List
4	Plan Risk Response	For negative risks: Avoid, Mitigate, Transfer, Accept. For positive risks: Exploit, Enhance, Share, Accept. Assign risk owners.	Risk Response Plan
5	Monitor & Control Risks	Periodically review risk register; identify new risks; close resolved risks; conduct risk audits; handle unexpected risks with workarounds	Updated Risk Register; Workarounds

💡

Risk Management is iterative, not sequential: Although presented as 5 steps, risk management is a continuous loop. New risks emerge throughout a project or product lifecycle. The risk register is a living document that must be reviewed regularly — not created once and filed away.

Step 1 Detail: Plan Risk Management

What to Define

✦ Risk-related terms and definitions
✦ Roles and responsibilities (Risk Owner concept)
✦ Tools and templates for risk management
✦ Probability & impact scales to be used
✦ Risk thresholds (what score triggers action)

Planning Covers How to…

✦ Identify risks (who, when, tools)
✦ Analyze risks (qualitative and/or quantitative)
✦ Plan risk responses (owners, strategies)
✦ Monitor and control risks (frequency, triggers)

Step 2: Identify Risks

Risk identification is a systematic and methodic process best performed in a group environment. A wide range of stakeholders participate — management, employees, customers, and other interested parties. The output is a Risk Register listing all identified risks.

Key Characteristics

▸Systematic and methodic — not ad hoc
▸Best done in a group environment
▸Involves wide range of stakeholders
▸Identifies both positive and negative risks
▸Iterative — risks can emerge at any time

Who Participates?

Management Employees Customers Suppliers Other Stakeholders Subject Matter Experts

Tools for Risk Identification

Tool	Type	How It's Used for Risk ID	Best For
Brainstorming	Group technique	Most common approach; free-form idea generation in a group; facilitator captures all risks without judgment	All risk identification; starting point for any risk session
Ishikawa Diagram	Cause & Effect	Systematically explores causes across categories (Man, Machine, Method, Material, Environment, Measurement)	Process risks; identifying root-cause risk categories
Flow Diagram	Process mapping	Map the process; identify each step where something could go wrong — inputs, outputs, handoffs, decision points	Operational and process risks; supply chain risk
SWOT Analysis	Strategic tool	Strengths, Weaknesses, Opportunities, Threats; internal and external risk identification	Strategic and organizational risk; positive risks (opportunities)
FMEA	Failure analysis	Systematically identifies failure modes and their effects; each failure mode is a potential risk	Product/process design risks; manufacturing risks
Checklist / Historical Data	Historical reference	Review lessons learned from previous projects/products; use industry-standard risk checklists	Repeatable processes; established product lines
Expert Interviews / Delphi	Expert elicitation	Individual or structured group interviews; Delphi uses iterative anonymous surveys to converge on consensus	Novel technologies; unique or high-stakes projects

The Risk Register

The risk register is the primary output of the Identify Risks process. It is a living document that is updated throughout all subsequent risk management steps.

Risk Register Field	Description
Risk ID	Unique identifier for each risk
Risk Description	Clear statement of the risk event and its potential cause and effect
Risk Category	Classification (Technical, Schedule, Cost, Scope, External, etc.)
Probability Score	Likelihood of occurrence (added during Analyze step)
Impact Score	Consequence severity if risk occurs (added during Analyze step)
Risk Score	Probability × Impact (added during Analyze step)
Risk Owner	Person responsible for monitoring and responding to this risk
Response Strategy	Planned approach (Avoid/Mitigate/Transfer/Accept or Exploit/Enhance/Share/Accept)
Response Actions	Specific actions to implement the chosen strategy
Status	Active / Closed / Occurred (became an Issue)

Step 3: Analyze Risks

Risk analysis prioritizes identified risks so that resources and attention can be focused on the highest-priority items. There are two main approaches: qualitative and quantitative.

Qualitative Risk Analysis

Quick and easy to perform. Uses descriptive or ordinal scales. Subjective by nature but valuable for initial prioritization when data is limited or time is short.

✦ Fast and cost-effective
✦ Subjective judgment
✦ Uses rating scales (Low/Medium/High or 1–9)
✦ Primary tool: Probability & Impact Matrix
✦ Good for all risks as initial screen

Quantitative Risk Analysis

Detailed and time-consuming. Uses numerical data to produce a statistical analysis of risk impact. Analytic, data-driven, and defensible.

✦ Requires real data or estimates
✦ Objective and numeric
✦ Tools: EMV Analysis, Monte Carlo, Decision Tree
✦ Used for high-priority risks (from qualitative screen)
✦ Provides probability distributions of outcomes

Quantitative Analysis Tools

Tool	Description	When to Use
Expected Monetary Value (EMV)	EMV = Probability × Impact ($). Calculates the expected financial value of a risk. Positive EMV = opportunity; Negative EMV = threat. Sum all EMVs to get overall risk exposure.	Cost/benefit decisions on risk responses; comparing alternative responses; setting contingency reserves
Monte Carlo Analysis	Computer simulation that runs the project/process thousands of times with randomly sampled input values. Produces a probability distribution of outcomes (cost, schedule, etc.).	Complex projects with many interacting risks; when you need confidence intervals on outcomes
Decision Tree	Diagram showing decisions, chance events (with probabilities), and outcomes (with values). Calculate EMV at each branch to determine best decision path.	Go/no-go decisions; make-or-buy; alternative response strategies; multi-stage decisions under uncertainty
Sensitivity Analysis	Determines which risk variable has the most impact on outcomes. Often visualized as a Tornado Diagram — bars sorted by impact magnitude.	Identifying which risks deserve the most attention; resource prioritization

FMEA vs. P&I Matrix — Key Comparison

Aspect	FMEA (Risk Priority Number)	Probability & Impact Matrix
Formula	RPN = Severity × Occurrence × Detection	Risk Score = Probability × Impact
Dimensions	3 dimensions (adds Detection)	2 dimensions (no Detection factor)
Impact / Severity	Severity (1–10 scale)	Impact (similar concept; often 1–9 scale)
Probability	Occurrence (1–10 scale)	Probability (1–9 or Low/Med/High)
Detection	Detectability score (1–10, inverse)	Not included
Primary Use	Product/process failure analysis	Project/process risk prioritization
Context	Design and process engineering	General risk management

Probability & Impact (P&I) Matrix

The Probability and Impact Matrix is the primary qualitative risk analysis tool. It evaluates each risk on two dimensions — likelihood of occurrence and potential consequence — then combines them into a risk score used for prioritization.

Core Formula

Risk Score = Probability × Impact

Higher scores = higher priority risks requiring more immediate attention and resource allocation.

Sample Probability Scale

Category	Score	Description
Very High	9	Risk event expected to occur
High	7	Risk event more likely than not to occur
Probable	5	Risk event may or may not occur (50/50)
Low	3	Risk event less likely than not to occur
Very Low	1	Risk event not expected to occur

Sample Impact Scale (by Project Objective)

Objective	Very Low (1)	Low (3)	Moderate (5)	High (7)	Very High (9)
Cost	Insignificant	<10% cost impact	10–20% cost impact	20–40% cost impact	>40% cost impact
Schedule	Insignificant	<5% schedule slip	5–10% schedule slip	10–20% schedule slip	>20% schedule slip
Scope	Barely noticeable	Minor areas impacted	Major areas impacted	Changes unacceptable to client	Product becomes useless
Quality	Barely noticeable	Minor functions impacted	Client must approve reduction	Quality reduction unacceptable	Product becomes useless

P&I Matrix — Numerical (1–9 Scale)

📊 Probability × Impact Matrix — Risk Scores

Prob ↓ / Impact →	1 (Very Low)	3 (Low)	5 (Moderate)	7 (High)	9 (Very High)
9 (Very High)	9	27	45	63	81 ★
7 (High)	7	21	35	49	63
5 (Moderate)	5	15	25	35	45
3 (Low)	3	9	15	21	27
1 (Very Low)	1 ☆	3	5	7	9

Low Risk — Monitor Medium Risk — Plan Response High Risk — Immediate Action Critical Risk — Top Priority

📝

Exam Example: A risk has Very Low probability (score = 1) but Very High impact (score = 9). Risk Score = 1 × 9 = 9. This falls in the yellow zone — medium priority. Compare to a risk with Moderate probability (5) and Moderate impact (5) = score of 25 — which is higher priority despite neither dimension being extreme.

Step 4: Plan Risk Response

Risk response planning determines how to decrease the possibility of negative risks affecting objectives and how to increase the possibility of positive risks helping objectives. Strategies differ depending on whether the risk is negative (threat) or positive (opportunity).

Negative Risk (Threat) Responses

Goal: Reduce the probability, impact, or both of a negative event affecting your objectives.

🚫 AVOID

Change the plan to eliminate the risk entirely. The risk event becomes impossible.

Examples: Adopt proven approach instead of new one; improve team communication; change project scope

⚠️ MITIGATE

Reduce the probability and/or impact of the risk. The risk may still occur but its effect is lessened.

Examples: Simplify processes; develop prototype; additional inspections; lessons learned from past projects

🔄 TRANSFER

Shift the financial impact of the risk to a third party. The risk still exists — it's moved, not eliminated.

Examples: Insurance; performance warranty; subcontracting; fixed-price contracts

✅ ACCEPT

Acknowledge the risk and take no action — when no action is feasible or impact is too small.

Passive: No contingency plan; monitor and address if/when it occurs

Active: Create contingency plan in advance; monitor triggers

Positive Risk (Opportunity) Responses

Goal: Increase the probability, impact, or both of a positive event benefiting your objectives.

🎯 EXPLOIT

Eliminate uncertainty — ensure the opportunity definitely happens and make maximum use of it.

Examples: Assign best team members; allocate additional resources; fast-track the opportunity

📈 ENHANCE

Increase the probability and/or positive impact of the opportunity. Unlike Exploit, the opportunity may still not occur.

Examples: Add more resources; improve preconditions; invest in enablers of the opportunity

🤝 SHARE

Allocate some or all of the opportunity to a third party best able to capture it.

Examples: Joint venture; partnership; risk-sharing team; consortium or special-purpose company

✅ ACCEPT

Accept the opportunity if it occurs but do not actively pursue it — when the probability and rewards are not attractive enough to justify investment.

Example: A beneficial side-effect of another activity that will be welcomed but not specifically engineered

🔍

Accept applies to both sides: Accept is the only strategy that appears in both negative and positive risk response tables — but its meaning differs. For threats, Accept means tolerating the risk (passive or active). For opportunities, Accept means welcoming the benefit if it naturally occurs without actively pursuing it.

Step 5: Monitor & Control Risks

Risk monitoring and control is an ongoing process throughout the entire project or product lifecycle — not just a phase-end activity. The goal is to keep the risk register current, ensure response plans are being executed, and handle unexpected risks as they arise.

Core Activities

Ongoing Reviews

✦ Regularly review identified risks — are they still relevant?
✦ Identify and add new risks that emerge
✦ Remove risks that are no longer relevant or have been resolved
✦ Track triggers (warning signs) that indicate a risk is about to occur

Risk Audits

✦ Verify that risk response plans are actually being implemented
✦ Confirm effectiveness of implemented responses
✦ Document lessons learned for future risk management
✦ May be conducted by an independent auditor

Handling Unexpected Risks: Workarounds

A workaround is an unplanned response to a risk event that was not identified or not expected. When a risk materializes as an issue and no contingency plan exists, a workaround is improvised to minimize the impact.

▸ Used to deal with unexpected risks to reduce their impact
▸ Workarounds should be documented — they become lessons learned and may identify new risks
▸ Distinguished from contingency plans: contingency = planned in advance; workaround = improvised on the fly

Risk vs. Issue vs. Workaround — Key Distinctions

Concept	Timing	Response Type	Documentation
Risk	Future — has not yet occurred	Contingency Plan (planned in advance)	Risk Register
Issue	Present — the risk has now occurred	Execute contingency plan (if one exists) or workaround	Issue Log / Risk Register update
Workaround	Present — unidentified risk has occurred	Improvised, unplanned response	Document for lessons learned; update risk register
Contingency Plan	Created in advance (during Plan Risk Response)	Pre-defined actions triggered when a specific risk occurs	Risk Register / Risk Response Plan

💡

Risk Monitoring is Continuous: Risks change over time. A low-probability risk can become high-probability as circumstances change. A risk can be closed when conditions change such that it can no longer occur. New risks can emerge at any stage. Regular risk review meetings are best practice.

Risk Management — Quick Reference & Exam Summary

Key formulas, mnemonics, and comparison tables for rapid reference.

Key Formulas

Risk Score = Probability × Impact
FMEA RPN = Severity × Occurrence × Detection
EMV = Probability (%) × Impact ($)

5-Step Process Mnemonic

Plan → Identify → Analyze → Respond → Monitor

"Please Identify All Risk Management" — steps in order

Negative vs. Positive Risk Strategies — Side by Side

Negative Risk (Threat)	Description	Positive Risk (Opportunity)	Description
Avoid	Eliminate the risk entirely — change the plan	Exploit	Ensure the opportunity definitely happens
Mitigate	Reduce probability and/or impact	Enhance	Increase probability and/or impact
Transfer	Shift financial impact to a third party	Share	Share the opportunity with a third party
Accept	Tolerate the risk (passive or active)	Accept	Welcome it if it occurs — without actively pursuing

Common Pitfalls to Avoid

Trap	Correct Understanding
Thinking all risks are negative	ISO 9000:2015 explicitly includes positive risks (opportunities). "Positive risk" is not an oxymoron.
Confusing Risk with Issue	Risk = future potential event. Issue = risk that has already materialized. They require different responses.
Thinking Transfer eliminates risk	Transfer moves the financial consequence to a third party — the risk event can still occur. It's not Avoid.
Confusing FMEA RPN with P&I Matrix score	FMEA uses 3 factors (including Detection). P&I Matrix uses only 2 (Probability × Impact, no Detection).
Thinking Qualitative is always done before Quantitative	True in practice — qualitative is used to screen/prioritize. But both can be used on different risks depending on data availability.
Passive vs. Active Acceptance	Passive = no plan, just deal with it if it happens. Active = create contingency plan in advance for the accepted risk.
Workaround vs. Contingency Plan	Contingency plan = pre-planned response for an identified risk. Workaround = improvised response for an unexpected/unidentified risk.

Risk Management Tools Summary

Tool	Step Used	Type	Key Feature
Brainstorming	Identify	Group technique	Most common identification tool
Ishikawa Diagram	Identify	Cause & Effect	Organizes causes by category (6M)
SWOT Analysis	Identify	Strategic	Captures positive risks (Opportunities)
FMEA	Identify / Analyze	Failure analysis	RPN = Severity × Occurrence × Detection
P&I Matrix	Analyze (Qualitative)	Risk prioritization	Risk Score = Probability × Impact; color-coded zones
EMV Analysis	Analyze (Quantitative)	Financial analysis	EMV = P(%) × Impact($); sum across all risks
Monte Carlo	Analyze (Quantitative)	Simulation	Probability distribution of project outcomes
Decision Tree	Analyze (Quantitative)	Decision analysis	Visual branching of decisions and outcomes
Risk Register	All steps	Living document	Central repository for all risk information

About This Reference

About the Author

🔗 LinkedIn

Mahesh Babu Nelakurthi

Sr. Quality & Reliability Engineer · Ultium Cells LLC · Ohio

I currently work as a Senior Quality and Reliability Engineer at Ultium Cells LLC — a GM and LG joint venture at the forefront of America's push toward electric mobility. In advanced, high-volume manufacturing, the stakes of getting quality right are real and immediate. That environment teaches you quickly that the most dependable approach is to think in first principles: go back to what is actually known, build your reasoning from there, and trust the data to show you where the process is telling the truth.

That experience also reinforced a conviction I have held for a long time: that the fundamentals matter most. Not because advanced methods are unimportant — but because the right basic question, asked precisely, almost always points to the answer. Quality Datalabs is built around that idea: a resource grounded in first principles, free to use, and written for engineers who want to understand the why behind every decision.

Credentials & Education

🎓 American Society for Quality (ASQ) Certified Six Sigma Black Belt — CSSBB (Exp. 2029)

🎓 American Society for Quality (ASQ) Certified Quality Engineer — CQE (Exp. 2026)

📘 Master of Science in Industrial Management — Texas A&M University, Kingsville (2017)

📘 Bachelor of Technology in Chemical Engineering — Vignan's Foundation for Science, Technology & Research (2014)

Why This Exists

"We stand on the shoulders of giants. Deming, Juran, Shewhart, Taguchi, Ishikawa — they spent lifetimes building the foundations. That knowledge belongs to all of us."

Quality engineering knowledge has too often been locked behind expensive certifications, paywalled journals, and five-day seminars. This reference was built to change that — to make the full depth of quality engineering accessible to every engineer, at every level of their career.

Every Cpk we compute, every control chart we plot, every FMEA we run — these are acts of responsibility. Somewhere at the end of the supply chain is a person who will use what we make. They trust us, without knowing us, to have done the work properly.

As we stand on the shoulders of giants, we have a responsibility to be better, to strive continuously for quality products reaching the customer. That responsibility is not a burden — it is the privilege of the profession.

What This Reference Covers

📊 Six Sigma, DPMO & DMAIC

🔬 MSA — AIAG 4th Edition

📈 SPC & Process Capability

⚙️ Reliability Engineering

📐 39 Statistical Distributions

🎖️ MIL-STD-1916 & Sampling

📋 FMEA — AIAG-VDA 2019

🎓 Quality Philosophy

🏭 Supplier Quality & PPAP/APQP

🧮 Live DPMO & RPN Calculator

Have a suggestion, found an error, or want to contribute? Reach out — this reference grows through the community it serves.

🔗 Connect on LinkedIn

📬 Send an Enquiry

Found an error, have a question about the content, or want to suggest a new topic? Fill in the form — I read every submission and will get back to you directly.

Full Name *

Work Role / Title

Email Address *

Industry / Sector

Enquiry Type

Which Section / Topic?

Your Message *

* Required fields. Your email is used only to reply to your enquiry and is never shared.

Live Calculator

24 interactive calculators covering Six Sigma, Probability, Reliability, GR&R, DOE, SPC, and Sampling. Enter values — results update instantly. No data leaves your browser.

Six Sigma & Process Capability

Convert between sigma levels, DPMO, and capability indices. Enter any combination — all results update instantly.

📐

DPMO ↔ Sigma Level

Convert between defects per million opportunities and sigma level

σ = NORM.INV(1 − DPMO/1,000,000) + 1.5

Enter either value — the other is computed

DPMODefects Per Million Opportunities

— or enter —

Sigma LevelLong-term (includes 1.5σ shift)

Results

DPMO

—

Sigma (LT)

—

Sigma (ST)

—

Yield %

—

📏

Cp / Cpk / Ppk Calculator

Process capability from specification limits and process statistics

Cp = (USL−LSL)/(6σ) · Cpk = min[(USL−μ),(μ−LSL)]/(3σ)

USLUpper Specification Limit

LSLLower Specification Limit

Process Mean (μ)Sample average x̄

Std Dev (σ)Sample standard deviation s

Results

—

Cpk

—

Cpl

—

Cpu

—

DPMO (est.)

—

Sigma Level

—

📉

Z-Score ↔ Probability

Standard normal conversions — enter Z or probability

P(X < x) = Φ(z) = Φ((x−μ)/σ)

Z-ScoreStandard deviations from mean

Results

P(X < z)

—

P(X > z)

—

P(|X| < z)

—

Z-Score

—

🎯

Sample Size for Capability Study

Minimum n to estimate Cpk with specified confidence

n ≥ 0.5 × χ²(α,2) / (Cpk_target × d²)

Target CpkMinimum acceptable Cpk

Confidence LevelStatistical confidence (%)

Allowable Error δMax error on Cpk estimate

Results

Min. Sample Size n

—

Cpk Lower Bound

—

Recommendation

—

Probability

Classical probability rules, conditional probability, Bayes, and common distributions. Enter values and see step-by-step working.

🎲

Basic Probability Rules

Union, intersection, conditional — enter P(A) and P(B)

P(A∪B) = P(A)+P(B)−P(A∩B) · P(B|A) = P(A∩B)/P(A)

P(A)Probability of event A (0–1)

P(B)Probability of event B (0–1)

P(A∩B)Joint probability (0–1)

Results

P(A∪B)

—

A or B

P(A∩B)

—

A and B

P(A|B)

—

A given B

P(B|A)

—

B given A

P(A′)

—

Not A

P(B′)

—

Not B

🔮

Bayes' Theorem

Update probability given new evidence — posterior from prior

P(A|B) = P(B|A)·P(A) / P(B)

P(A) — PriorInitial probability of A (0–1)

P(B|A) — LikelihoodP(evidence | A is true)

P(B|A′) — False positive rateP(evidence | A is false)

Results

P(A|B) — Posterior

—

P(B) — Evidence

—

Odds Ratio

—

Likelihood Ratio

—

🎯

Binomial Distribution

Probability of exactly k successes in n independent trials

P(X=k) = C(n,k)·pᵏ·(1−p)ⁿ⁻ᵏ

n — Number of trialsTotal independent trials

k — Number of successesTarget count (k ≤ n)

p — Success probabilityProbability per trial (0–1)

Results

P(X = k)

—

P(X ≤ k)

—

P(X ≥ k)

—

Mean (np)

—

Std Dev

—

🔢

Poisson Distribution

Count events per unit — defects per part, failures per hour

P(X=k) = e⁻λ · λᵏ / k!

λ — Average rateExpected events per unit (e.g. 2.3 defects/part)

k — Observed countNumber of events to evaluate

Results

P(X = k)

—

P(X ≤ k)

—

P(X > k)

—

Mean = Var

—

Reliability Engineering

MTBF, MTTR, availability, Weibull B-life, system reliability, and stress-strength interference — with live results.

⏱️

MTBF / MTTR / Availability

Core reliability metrics from failure and repair data

MTBF = Total Time / Failures · A = MTBF/(MTBF+MTTR)

Total Operating Time (hr)Aggregate hours across all units

Number of FailuresTotal failure events observed

Total Repair Time (hr)Sum of all corrective maintenance time

Results

MTBF (hr)

—

MTTR (hr)

—

Availability A

—

Failure Rate λ

—

FIT Rate

—

per 10⁹ hr

Downtime %

—

📈

Weibull B-Life & R(t)

Survival probability and B-life for any Weibull distribution

R(t) = exp[−(t/η)^β] · B_x = η·[−ln(1−x/100)]^(1/β)

β — Shape parameterβ<1: infant mortality · β=1: exponential · β>1: wear-out

η — Characteristic life (hr)Time at which 63.2% of units fail

Mission time t (hr)Time at which to evaluate R(t)

Results

R(t) — Survival

—

F(t) — Failed

—

h(t) — Hazard rate

—

MTTF

—

B1 Life

—

B10 Life

—

B50 Life

—

🔗

System Reliability

Series, parallel, or k-out-of-n configurations — up to 5 components

Series: R=∏Rᵢ · Parallel: R=1−∏(1−Rᵢ) · k/n: Σ C(n,j)Rʲ(1−R)ⁿ⁻ʲ

Component Reliabilities (0–1) — leave blank to skip

R₁

R₂

R₃

R₄

R₅

Results

System R

—

System F

—

Components

—

Configuration

—

⚡

Stress-Strength Interference

Reliability when both stress and strength are random variables

z = (μ_R−μ_S)/√(σ_R²+σ_S²) · Reliability = Φ(z)

Strength Distribution R ~ N(μ_R, σ_R)

Mean Strength μ_Re.g. 500 MPa

Strength Std Dev σ_R

Stress Distribution S ~ N(μ_S, σ_S)

Mean Stress μ_Se.g. 350 MPa

Stress Std Dev σ_S

Results

Reliability index z

—

Reliability R

—

P(Failure)

—

Safety Factor

—

μ_R/μ_S

Failures/Million

—

GR&R / Measurement System Analysis

Gauge Repeatability & Reproducibility — enter variance components to get %GR&R, ndc, and AIAG acceptance guidance.

🔬

%GR&R from Variance Components

AIAG MSA 4th Ed. — enter EV, AV, PV standard deviations

GRR = √(EV²+AV²) · %GRR = 100×GRR/(TV) · ndc = 1.41×PV/GRR

EV — Equipment Variation σRepeatability std dev

AV — Appraiser Variation σReproducibility std dev

PV — Part Variation σBetween-part std dev

Tolerance (USL−LSL)Leave blank to use TV-based %GRR

Results — AIAG Criteria

GRR σ

—

TV σ

—

%GR&R

—

%EV

—

%AV

—

%PV

—

ndc

—

≥5 required

Decision

—

AIAG MSA criteria: %GR&R <10% = Acceptable · 10–30% = Conditional · >30% = Unacceptable. ndc ≥ 5 required for the gauge to distinguish parts.

📐

%GR&R — Range Method (Quick)

From operator averages and range averages — AIAG short form

EV = R̄/d₂ · AV = √[((x̄diff/d₂*)²−(EV²/nr)]

R̄ — Average RangeMean of all operator ranges across parts

x̄diff — Operator Average DiffMax operator avg − Min operator avg

n — Parts per studyNumber of parts measured

r — ReplicatesNumber of trials per operator per part

AppraisersNumber of operators

Part Variation (PV σ)Known part std dev or leave blank

Results

EV (Repeatability σ)

—

AV (Reproducibility σ)

—

GRR σ

—

%GR&R

—

ndc

—

Decision

—

Design of Experiments

Number of runs, resolution, and design properties for full factorial, fractional factorial, Plackett-Burman, and Taguchi designs.

🧪

Experiment Run Calculator

How many runs for your design type? Enter factors and levels.

Full: 2ᵏ · Fractional: 2ᵏ⁻ᵖ · PB: next multiple of 4 > k · Taguchi: Lₙ

k — Number of factorsIndependent variables in the study

ReplicatesNumber of times to repeat the full design

Centre points (RSM/CCD only)

Results

Runs Required

—

Resolution

—

Main Effects

—

2-way Interactions

—

Design

—

DF (error)

—

📊

Main Effect & S/N Ratio

Factor effect magnitude and Taguchi Signal-to-Noise ratio

ME = Ȳ(+1) − Ȳ(−1) · S/N = −10·log₁₀(Σy²/n) [STB]

Main Effect Calculator

Average at High level (+1)Mean response when factor is at high setting

Average at Low level (−1)Mean response when factor is at low setting

S/N Ratio (up to 5 replicates)

Target τ

Results

Main Effect

—

% Change

—

S/N Ratio (dB)

—

Mean

—

Std Dev

—

Statistical Process Control

Control limits for variables and attribute charts — enter your process data to get UCL, LCL, and center line instantly.

📊

Control Limits Calculator

X̄-R, X̄-s, p, np, c, u charts — select type and enter data

UCL = CL + 3σ · X̄-R: UCL_R = D₄R̄ · UCL_X̄ = X̄̄ + A₂R̄

X̄̄ — Grand averageMean of subgroup averages

R̄ — Average rangeMean of subgroup ranges

n — Subgroup size

Control Limits

UCL (main)

—

Center Line

—

LCL (main)

—

UCL (range/s)

—

CL (range/s)

—

Process σ̂

—

🎯

Capability from Control Chart

Estimate Cp, Cpk from R̄ or s̄ without raw data

σ̂ = R̄/d₂ (or s̄/c₄) · Cp = (USL−LSL)/(6σ̂) · Cpk = min(Cpu,Cpl)

USL — Upper Spec Limit

LSL — Lower Spec Limit

X̄̄ — Process Mean

R̄ — Average Range

n — Subgroup size

Capability Indices

σ̂ (from R̄/d₂)

—

Cpk

—

Cpu

—

Cpl

—

DPMO (est.)

—

Sampling & Confidence Intervals

AQL sampling plans, confidence intervals for means and proportions, and reliability demonstration sample sizes.

📦

AQL Sample Size — ANSI Z1.4

Lot-based acceptance sampling — single sampling plan

n and c from Z1.4 table · P(accept) = Σⱼ₌₀ᶜ C(n,j)·pʲ·(1−p)ⁿ⁻ʲ

Lot Size NTotal units in the batch

Z1.4 Single Sampling Plan

Code Letter

—

Sample Size n

—

Accept ≤ c

—

Reject ≥ r

—

% Inspected

—

📏

Confidence Intervals

For mean (t-interval) and proportion (Wilson score)

CI_μ = x̄ ± t(α/2,n−1)·s/√n · CI_p = Wilson score interval

Sample mean x̄

Sample std dev s

Sample size n

Confidence level %

Results

Lower Bound

—

Point Estimate

—

Upper Bound

—

Margin of Error

—

t / z critical

—

✅

Reliability Demonstration

Zero-failure test: sample size to prove R* at confidence C

n = ln(1−C)/ln(R*) · MTBF_lower = −2T_total/ln(α)

Required Reliability R*Minimum acceptable reliability (0–1)

Confidence Level CStatistical confidence (0–1)

Results

Sample Size n

—

MTBF Lower

—

Conclusion

—

We stand on the shouldersof giants. Now it isour turn.

New to quality

Working engineer

Measurement & process

Reliability & risk

Six Sigma & DPMO

Measurement System Analysis

Quality Philosophy

Quality Systems

Statistical Process Control

DPMO & Capability Calculator

Reliability Engineering

Statistical Distributions

Military & Defense Standards

Applied Statistics

FMEA & RPN

Risk Management

Design of Experiments

Design for Six Sigma

Six Sigma & DPMO

Six Sigma Metrics Toolkit — DPU, DPO, DPMO, Yield & RTY

① DPU — Defects Per Unit

② DPO — Defects Per Opportunity

③ DPMO — Defects Per Million Opportunities

④ FPY — First Pass Yield

⑤ RTY — Rolled Throughput Yield

⑥ Converting DPMO to Sigma Level (Z)

⑦ Quick Reference — Diagnostic Signals

How the Normal Distribution Creates DPMO

Step-by-Step: µ and σ → DPMO

Standardize to Z

Find Z at each spec limit

Compute both tail areas

Scale to DPMO

🔑 Key Definitions

DPMO

Φ(z)

Sigma Level (Z)

True 6σ Centered

Plastic Housing Wall Thickness: 2.450 – 2.550 mm

Step A — Compute σ Required for True 6σ

Step B — Centred Process (Short-term, µ = 2.500 mm)

Step C — After +1.5σ Drift (µ = 2.5125 mm)

Step D — Capability Indices

📋 Process Summary

🔑 What This Tells You

The 1.5σ Shift — Why "3.4 DPMO at 6σ"?

Cp vs Cpk — The Critical Distinction

⚖️ ST vs LT Sigma

Sigma Level ↔ DPMO Reference

Monte Carlo Simulation

Simulation Results (N = 400,000)

When Simulation Beats Analytical Methods

🎲 Required Sample Size

DMAIC — The Five-Phase Process Improvement Roadmap

Splitting the DMAIC — Four Focused Paths to Improvement

Goal: Achieve stable, predictable, capable output (Cpk ≥ 1.33)

Goal: Increase machine/process uptime and throughput

Goal: Eliminate the 8 wastes — TIMWOOD + Skills

Goal: Drive defect frequency to zero

COPQ & Project Selection — Linking Six Sigma to Business Results

Multi-Level Pareto — Drilling to Project Scope

Project Charter — The Contract Between Team and Management

Related modules

Measurement System Analysis

MSA Variation Taxonomy — The Complete Tree

Accuracy vs Precision — The Core Distinction

The Five MSA Error Components Explained

The systematic offset from true value

Bias that changes across the measurement range

Bias drift over time

Within-operator scatter — Equipment Variation

Between-operator scatter — Appraiser Variation

Three Methods to Quantify GR&R (Precision)

The Fundamental MSA Equation

S.W.I.P.E. — The Five Error Sources (AIAG 4th Ed.)

AIAG Mandatory Sequence — Never Skip

Discrimination — The 10-to-1 Rule

🔑 Key Definitions (AIAG 4th Ed.)

Bias

We stand on the
shoulders
of giants.
Now it is
our turn.